Re: [Tracker] not indexing text from PDF files



On 31/10/13 17:49, Brian J. Murrell wrote:
On 10/31/2013 01:41 PM, Aleksander Morgado wrote:
/usr/libexec/tracker-extract -f /path/to/file.pdf

Cool.  So, that on a the given file I see:

SPARQL item:
--
  a nfo:PaginatedTextDocument ;
         nie:title "NONE" ;
         nie:subject "NONE" ;
         nco:creator [ a nco:Contact ;
         nco:fullname "NONE"] ;
         nao:hasTag ?tag1 ;
         nfo:pageCount 1 ;
         nie:plainTextContent
"list\nof\nwords\nseparated\nby\ncarriage-returns\n'' " .

Searching for any of those words in the plainTextContent item fails.

OK. So you weren't expecting to find "RT00" with that file then :)

So there are a few things... first, I would check that the file is indexed before searching ... if it isn't then you won't find those words. Note that tracker-extract does not index the file, it just extracts the information, usually tracker-miner-fs calls APIs to talk to tracker-extract. The example above is really just a way to see what we find in a file you specify on the command line.

Is the file above file:///home/brian/tmp/2013-10-26-3.pdf ?

To make sure the file is indexed, you can use tracker-control -f $FILENAME and it should take care of that for you.

We are on IRC if you need more help... let us know :)

--
Regards,
Martyn

Founder & Director @ Lanedo GmbH.
http://www.linkedin.com/in/martynrussell


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]