Re: [Tracker] Text extraction on text formats



Laurent Aguerreche wrote:
Le vendredi 17 novembre 2006 Ã 14:37 +0100, Luca Ferretti a Ãcrit :
Il giorno gio, 16/11/2006 alle 21.36 +0100, Laurent Aguerreche ha
scritto:
Le jeudi 16 novembre 2006 Ã 18:55 +0000, Jamie McCracken a Ãcrit :
Luca Ferretti wrote:
I'm trying to check and eventually expand info in
http://live.gnome.org/Tracker/SupportedFormats.


OTT (OpenDocument Text Template)
  content:              no (????)
Now yes.

Lauren, maybe a similar addition is needed for other OO.o 1&2 *template
mimetypes?

So here a patch to add them. I do not add more support for Star Division
files because I just cannot make a file of that type!
I remove calls to "nice" because children of a processus inherit its
priority... so there are at 19.
I saw that MS Word filter uses vwText. According to the site of wvWare
( http://wvware.sourceforge.net/ ), Abiword is now preferred to this
tool. But I wonder if we could use libGSF directly to extract content of
Word files... If I remember correctly, Wv just uses libGSF.

I also propose a patch to:
* extract text content only
in /tmp/Tracker-user.pid/tmp_text_file_XXXXXX so now everything happens
in /tmp/Tracker-user.pid and it should ensure privacy of files,
* not make a not useful hierarchy like /home/user
in /tmp/Tracker-user.pid to store cache of SQLite.


Laurent.

have applied new templates and tmp usage patches

I have left out the changes to existing filters as they are not needed. WvText *must* only be used in /tmp only as some versions of it incorrectly touches the doc file which can cause looping in trackerd with the doc file being constantly reindexed

--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]