Re: [Tracker] Text extraction on text formats



Laurent Aguerreche wrote:
Le jeudi 16 novembre 2006 Ã 18:55 +0000, Jamie McCracken a Ãcrit :
Luca Ferretti wrote:
I'm trying to check and eventually expand info in
http://live.gnome.org/Tracker/SupportedFormats.

So I'm planning to create files of various formats, then search for text
inside them.
############ Test Procedure ###

I used the "stable" version (0.5.1), while I've the CVS versions
installed too (I'll test it later).

By now I tested some word processor document formats: I wrote a one-line
document in OO.o Writer (the one in Ubuntu Edgy) and I saved it in
various format. The file has a content and some metadata (the one you
can add in File->Properties).

The exact procedure is:
     1. create the ODT file
     2. save it and close OO.o
     3. open the ODT file
     4. use File -> Save As ..
     5. chose a different format
     6. save the file in new "alien" format
     7. close the file and OO.o
     8. restart from #3

Then I searched with `tracker-search` at least 2 times for each file:
one for a word that's only in file content ("potenzialitÃ"), one for a
word that's only in file metadata ("particolare") - of course I wrote
this file in Italian language.

############# Test Results ###

ODT (OpenDocument Text)
  content:              yes
  metadata:             yes [1]
  extra:                keywords metadata are auto-tagged

OTT (OpenDocument Text Template)
  content:              no (????)

Now yes.

  metadata:             yes [1]
  extra:                as above

SXW (OpenOffice 1.x Text)
  content:              yes
  metadata:             no

STW (OpenOffice 1.x Text Template)
  content:              no

Now yes.

  metadata:             no

DOC (Word 97/2000/XP | Word 95 | Word 6.0)
  content:              yes [2]
  metadata:             no  [3]

RTF (Rich Text Format)
  content:              no  [4]
  metadata:             no  [4]


For what I noted above, I provide a trivial patch which just adds two
new filters.


thanks - have now added to cvs


--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]