Re: [Tracker] Keywords from pdf dokuments



On 30/12/10 21:03, Joerg Beyer wrote:
Hello,

Hi,

I have Tracker 0.8.17 on ubuntu running - everything is fine. But I
can't find keywords from pdf documents in the index. Pdf keywords are
something like tags. When I index a pdf like this:

/usr/lib/tracker/tracker-extract -v 3 -f example.pdf

then I see this among the verbose output:
...
      nao:hasTag [ a nao:Tag ;
      nao:prefLabel "myFirstTag"] ;
      nao:hasTag [ a nao:Tag ;
      nao:prefLabel "mySecondTag"] ;
...

This looks fine - but the keywords seem not to be in the index. This here

tracker-info example.pdf | grep myFirstTag

has no outout.

It is unlikely that you would see any output. Tags are not stored according to their preferred label. Using tracker-info, you would see the tags listed as something like:

  'nao:hasTag' = 'urn:uuid:f2f41315-51d9-df02-0dce-1448b8b24d4f'
  'nao:hasTag' = 'urn:uuid:97308668-8c4a-4b74-2ada-a65306f5e85c'

If you use tracker-info on the 'urn:uuid:...' string then you can see what the details of the tag are.

To make this all easier, there is tracker-tag. You can use man tracker-tag to see how to use it, to do what you're asking should work using:

  tracker-tag example.pdf

This will list all tags for that filename. Alternatively, you can use tracker-tag -st which will list all tags and all files per tag.

Is there a way to search for pdf documents by their keywords?

Well, you can search for *any* documents using keywords:

  tracker-tag -st myFirstTag

For PDFs specifically, you would have to use a SPARQL query with tracker-sparql. Something like:

tracker-sparql -q 'select nie:url(?file) where { ?file a nfo:FileDataObject ; nie:mimeType "application/pdf" ; nao:hasTag ?tag . ?tag nao:prefLabel "myFirstTag" }'

Hope this helps.

--
Regards,
Martyn



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]