Re: [Tracker] not indexing text from PDF files



Hi,

 Few details you can also check:

1. Tracker removes "stop words". Words very common like "the" or "almost"... are you searching with one of those? There are different lists for each language and you can find them in /usr/share/tracker/languages/

2. Tracker indexes only 10000 words in the PDF. Is the first occurence of the word you search beyond that limit?

 These values can be adjusted via gsettings:

$ gsettings list-recursively org.freedesktop.Tracker
...
org.freedesktop.Tracker.Extract max-bytes 1048576
...
org.freedesktop.Tracker.FTS ignore-numbers true
org.freedesktop.Tracker.FTS ignore-stop-words true
org.freedesktop.Tracker.FTS max-word-length 30
org.freedesktop.Tracker.FTS max-words-to-index 10000

Regards,

Ivan



On Thu, Oct 31, 2013 at 7:54 AM, Martyn Russell <martyn lanedo com> wrote:
On 31/10/13 12:09, Brian J. Murrell wrote:
Using tracker 0.16.2 on Fedora 19, everything I have specified to be
indexed seems to been indexed but I selected a PDF file that should have
been indexed more or less at random, opened it with evince, searched for
some text in it (i.e. in evince) and then asked tracker search (-l 1000
so that I got all of the search results) for the same text.  It didn't
find it.

I did confirm that the more or less randomly chosen PDF file is in the
index.  It seems that it's contents are not though.

Any ideas?

What commands (exactly) are you using to find the PDF?

--
Regards,
Martyn

Founder & Director @ Lanedo GmbH.
http://www.linkedin.com/in/martynrussell
_______________________________________________
tracker-list mailing list
tracker-list gnome org
https://mail.gnome.org/mailman/listinfo/tracker-list



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]