Re: [Tracker] not indexing text from PDF files

From: Martyn Russell <martyn lanedo com>
To: "Brian J. Murrell" <brian interlinx bc ca>, Tracker mailing list <tracker-list gnome org>
Subject: Re: [Tracker] not indexing text from PDF files
Date: Fri, 01 Nov 2013 09:37:55 +0000

On 31/10/13 17:49, Brian J. Murrell wrote:

On 10/31/2013 01:41 PM, Aleksander Morgado wrote:

/usr/libexec/tracker-extract -f /path/to/file.pdf


Cool.  So, that on a the given file I see:

SPARQL item:
--
  a nfo:PaginatedTextDocument ;
         nie:title "NONE" ;
         nie:subject "NONE" ;
         nco:creator [ a nco:Contact ;
         nco:fullname "NONE"] ;
         nao:hasTag ?tag1 ;
         nfo:pageCount 1 ;
         nie:plainTextContent
"list\nof\nwords\nseparated\nby\ncarriage-returns\n'' " .

Searching for any of those words in the plainTextContent item fails.


OK. So you weren't expecting to find "RT00" with that file then :)

So there are a few things... first, I would check that the file isindexed before searching ... if it isn't then you won't find thosewords. Note that tracker-extract does not index the file, it justextracts the information, usually tracker-miner-fs calls APIs to talk totracker-extract. The example above is really just a way to see what wefind in a file you specify on the command line.


Is the file above file:///home/brian/tmp/2013-10-26-3.pdf ?

To make sure the file is indexed, you can use tracker-control -f$FILENAME and it should take care of that for you.


We are on IRC if you need more help... let us know :)

--
Regards,
Martyn

Founder & Director @ Lanedo GmbH.
http://www.linkedin.com/in/martynrussell

Follow-Ups:
- Re: [Tracker] not indexing text from PDF files
  - From: Brian J. Murrell

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]