[Tracker] tracker as full text index/search tool for a large collection of pdf, ps, djvu, dvi documents?
- From: Meik Hellmund <Meik Hellmund math uni-leipzig de>
- To: tracker-list gnome org
- Subject: [Tracker] tracker as full text index/search tool for a large collection of pdf, ps, djvu, dvi documents?
- Date: Sat, 4 Oct 2008 17:34:06 +0200
Dear tracker developers,
I have a collection of ca 10000 documents, mostly Postscript, PDF,
DjVu and DVI format and I am looking for a full text index/search
tool. I tried tracker 0.6.6 from Debian/unstable and have now some
questions where I didn't find the answer in the docu and faq.
- Tracker works fine and great with PDF documents. Full points!
That's what I am looking for.
- It seems that Postscript, Dvi and Djvu documents are not fully
indexed, only the metadata are used. How can I change this?
- It seems that Djvu files are classified as "images".
This may be true in a technical sense, but djvu is a format
especially adopted for scanned text and most djvu documents are
scanned books and similar.
I think you should reclassify them as "documents".
- How about compressed files? The documentation mentions that .gz
files are supported. What about .bz2? Is it possible to add a filter
for other compression methods?
- Are there plans to extend the query capabilities with respect to
the full text index? E.g., query for documents containing this but
not that word, or containing some words in a small distance from
- At the moment my collection of documents is mostly organized in a
hierarchy of directories. Is it possible to take this into
account in queries, e.g., query only for documents from a
subtree of the indexed tree?
I know this is quite a list of questions. Any pointers to answers of
any of them are really welcome.
Of course, tracker may simply be the wrong tool for what I want. Any
pointers to alternatives are also welcome.
Mathematisches Institut, Uni Leipzig
e-mail: Meik Hellmund math uni-leipzig de
] [Thread Prev