Re: [Tracker] Context extraction API/program



Hi Andrew,

El lun, 14-07-2008 a las 15:37 -0700, ext Andrew Leung escribiÃ:
I would like to utilize Tracker's file content extraction mechanism  
within my own program. Basically I would like to be able to parse  
various file types and pull out keywords. Does Tracker have any  
mechanism (API/separate program) that can I can use to pull content  
from various file types?

 The content is extracted using the scripts
in /usr/local/lib/tracker/filters/ . These scripts are organized
following the mimetype name, and usually they call external programs to
extract the contents (like wv, pdftotext, ...)

 Tracker obtains the mime-type of the file, decides the category and if
the category "Has full text", calls one of those scripts.


Beagle search has a program called 'beagle-extract-content' that I  
have been using for this purpose though I haven't been particularly  
happy with it. Thanks a lot.

 We have a "tracker-extractor" program. It extracts the _metadata_ of
the file (not the contents). Maybe it is also useful for you.

 Any improvement in the filters/extractors is welcome ;)

 Regards,

Ivan





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]