Re: [Tracker] Text extraction on text formats
- From: Jamie McCracken <jamiemcc blueyonder co uk>
- To: Laurent Aguerreche <laurent aguerreche free fr>
- Cc: Tracker List <tracker-list gnome org>
- Subject: Re: [Tracker] Text extraction on text formats
- Date: Sat, 18 Nov 2006 18:19:10 +0000
Laurent Aguerreche wrote:
Le vendredi 17 novembre 2006 Ã 14:37 +0100, Luca Ferretti a Ãcrit :
Il giorno gio, 16/11/2006 alle 21.36 +0100, Laurent Aguerreche ha
scritto:
Le jeudi 16 novembre 2006 Ã 18:55 +0000, Jamie McCracken a Ãcrit :
Luca Ferretti wrote:
I'm trying to check and eventually expand info in
http://live.gnome.org/Tracker/SupportedFormats.
OTT (OpenDocument Text Template)
content: no (????)
Now yes.
Lauren, maybe a similar addition is needed for other OO.o 1&2 *template
mimetypes?
So here a patch to add them. I do not add more support for Star Division
files because I just cannot make a file of that type!
I remove calls to "nice" because children of a processus inherit its
priority... so there are at 19.
I saw that MS Word filter uses vwText. According to the site of wvWare
( http://wvware.sourceforge.net/ ), Abiword is now preferred to this
tool. But I wonder if we could use libGSF directly to extract content of
Word files... If I remember correctly, Wv just uses libGSF.
I also propose a patch to:
* extract text content only
in /tmp/Tracker-user.pid/tmp_text_file_XXXXXX so now everything happens
in /tmp/Tracker-user.pid and it should ensure privacy of files,
* not make a not useful hierarchy like /home/user
in /tmp/Tracker-user.pid to store cache of SQLite.
Laurent.
have applied new templates and tmp usage patches
I have left out the changes to existing filters as they are not needed.
WvText *must* only be used in /tmp only as some versions of it
incorrectly touches the doc file which can cause looping in trackerd
with the doc file being constantly reindexed
--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]