Re: [Tracker] Text extraction on text formats

From: Jamie McCracken <jamiemcc blueyonder co uk>
To: Laurent Aguerreche <laurent aguerreche free fr>
Cc: Tracker List <tracker-list gnome org>
Subject: Re: [Tracker] Text extraction on text formats
Date: Sat, 18 Nov 2006 18:19:10 +0000

Laurent Aguerreche wrote:

Le vendredi 17 novembre 2006 Ã 14:37 +0100, Luca Ferretti a Ãcrit :

Il giorno gio, 16/11/2006 alle 21.36 +0100, Laurent Aguerreche ha
scritto:

Le jeudi 16 novembre 2006 Ã 18:55 +0000, Jamie McCracken a Ãcrit :

Luca Ferretti wrote:

I'm trying to check and eventually expand info in
http://live.gnome.org/Tracker/SupportedFormats.


OTT (OpenDocument Text Template)
  content:              no (????)

Now yes.

Lauren, maybe a similar addition is needed for other OO.o 1&2 *template

mimetypes?


So here a patch to add them. I do not add more support for Star Division
files because I just cannot make a file of that type!
I remove calls to "nice" because children of a processus inherit its
priority... so there are at 19.
I saw that MS Word filter uses vwText. According to the site of wvWare
( http://wvware.sourceforge.net/ ), Abiword is now preferred to this
tool. But I wonder if we could use libGSF directly to extract content of
Word files... If I remember correctly, Wv just uses libGSF.

I also propose a patch to:
* extract text content only
in /tmp/Tracker-user.pid/tmp_text_file_XXXXXX so now everything happens
in /tmp/Tracker-user.pid and it should ensure privacy of files,
* not make a not useful hierarchy like /home/user
in /tmp/Tracker-user.pid to store cache of SQLite.


Laurent.


have applied new templates and tmp usage patches

I have left out the changes to existing filters as they are not needed.WvText *must* only be used in /tmp only as some versions of itincorrectly touches the doc file which can cause looping in trackerdwith the doc file being constantly reindexed


--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/

References:
- [Tracker] Text extraction on text formats
  - From: Luca Ferretti
- Re: [Tracker] Text extraction on text formats
  - From: Jamie McCracken
- Re: [Tracker] Text extraction on text formats
  - From: Laurent Aguerreche
- Re: [Tracker] Text extraction on text formats
  - From: Luca Ferretti
- Re: [Tracker] Text extraction on text formats
  - From: Laurent Aguerreche

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]