Re: [Tracker] not indexing text from PDF files
- From: Martyn Russell <martyn lanedo com>
- To: "Brian J. Murrell" <brian interlinx bc ca>, Tracker mailing list <tracker-list gnome org>
- Subject: Re: [Tracker] not indexing text from PDF files
- Date: Fri, 01 Nov 2013 09:37:55 +0000
On 31/10/13 17:49, Brian J. Murrell wrote:
On 10/31/2013 01:41 PM, Aleksander Morgado wrote:
/usr/libexec/tracker-extract -f /path/to/file.pdf
Cool. So, that on a the given file I see:
SPARQL item:
--
a nfo:PaginatedTextDocument ;
nie:title "NONE" ;
nie:subject "NONE" ;
nco:creator [ a nco:Contact ;
nco:fullname "NONE"] ;
nao:hasTag ?tag1 ;
nfo:pageCount 1 ;
nie:plainTextContent
"list\nof\nwords\nseparated\nby\ncarriage-returns\n'' " .
Searching for any of those words in the plainTextContent item fails.
OK. So you weren't expecting to find "RT00" with that file then :)
So there are a few things... first, I would check that the file is
indexed before searching ... if it isn't then you won't find those
words. Note that tracker-extract does not index the file, it just
extracts the information, usually tracker-miner-fs calls APIs to talk to
tracker-extract. The example above is really just a way to see what we
find in a file you specify on the command line.
Is the file above file:///home/brian/tmp/2013-10-26-3.pdf ?
To make sure the file is indexed, you can use tracker-control -f
$FILENAME and it should take care of that for you.
We are on IRC if you need more help... let us know :)
--
Regards,
Martyn
Founder & Director @ Lanedo GmbH.
http://www.linkedin.com/in/martynrussell
[Date Prev][
Date Next] [Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]