Re: [Tracker] Text extraction on text formats
- From: Jamie McCracken <jamiemcc blueyonder co uk>
- To: Laurent Aguerreche <laurent aguerreche free fr>
- Cc: Tracker List <tracker-list gnome org>
- Subject: Re: [Tracker] Text extraction on text formats
- Date: Thu, 16 Nov 2006 22:41:06 +0000
Laurent Aguerreche wrote:
Le jeudi 16 novembre 2006 Ã 18:55 +0000, Jamie McCracken a Ãcrit :
Luca Ferretti wrote:
I'm trying to check and eventually expand info in
http://live.gnome.org/Tracker/SupportedFormats.
So I'm planning to create files of various formats, then search for text
inside them.
############ Test Procedure ###
I used the "stable" version (0.5.1), while I've the CVS versions
installed too (I'll test it later).
By now I tested some word processor document formats: I wrote a one-line
document in OO.o Writer (the one in Ubuntu Edgy) and I saved it in
various format. The file has a content and some metadata (the one you
can add in File->Properties).
The exact procedure is:
1. create the ODT file
2. save it and close OO.o
3. open the ODT file
4. use File -> Save As ..
5. chose a different format
6. save the file in new "alien" format
7. close the file and OO.o
8. restart from #3
Then I searched with `tracker-search` at least 2 times for each file:
one for a word that's only in file content ("potenzialitÃ"), one for a
word that's only in file metadata ("particolare") - of course I wrote
this file in Italian language.
############# Test Results ###
ODT (OpenDocument Text)
content: yes
metadata: yes [1]
extra: keywords metadata are auto-tagged
OTT (OpenDocument Text Template)
content: no (????)
Now yes.
metadata: yes [1]
extra: as above
SXW (OpenOffice 1.x Text)
content: yes
metadata: no
STW (OpenOffice 1.x Text Template)
content: no
Now yes.
metadata: no
DOC (Word 97/2000/XP | Word 95 | Word 6.0)
content: yes [2]
metadata: no [3]
RTF (Rich Text Format)
content: no [4]
metadata: no [4]
For what I noted above, I provide a trivial patch which just adds two
new filters.
thanks - have now added to cvs
--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]