Re: [Tracker] Using tracker extractors from other applications



Nikolaus Rath <Nikolaus-BTH8mxji4b0 public gmane org> writes:
Now that I know what to look for, I did some Googling as you suggested
and found a Python module for parsing turtle
(http://rdflib.googlecode.com/) so I will probably use that.

In case someone else is interested: the following Python code uses
tracker to extract the plain text content of a file:

import textwrap
import dbus
import os
from rdflib.graph import ConjunctiveGraph
from rdflib.parser import StringInputSource
from rdflib.namespace import Namespace
from rdflib.term import URIRef

def print_plain_text(path):
    prefix = textwrap.dedent('''\
    @prefix nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#> .
    @prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#> .
    @prefix nco: <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#> .
    @prefix nmo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nmo#> .
    @prefix ncal: <http://www.semanticdesktop.org/ontologies/2007/04/02/ncal#> .
    @prefix nexif: <http://www.semanticdesktop.org/ontologies/2007/05/10/nexif#> .
    @prefix nid3: <http://www.semanticdesktop.org/ontologies/2007/05/10/nid3#> .
    ''')

    nie_ns = Namespace(URIRef('http://www.semanticdesktop.org/ontologies/2007/01/19/nie#'))

    url = 'file://' + os.path.abspath(path)
    bus = dbus.SessionBus()
    proxy = bus.get_object('org.freedesktop.Tracker1.Extract', 
                           '/org/freedesktop/Tracker1/Extract')
    tracker = dbus.Interface(proxy, 'org.freedesktop.Tracker1.Extract')
    meta = tracker.GetMetadata(url, '')[1].toPython()
    graph = ConjunctiveGraph()
    graph.parse(StringInputSource(prefix + '<%s>' % url + meta), format='n3')
    contents = [ x[2].toPython() for x 
                 in graph.triples((URIRef(url), nie_ns.plainTextContent, None)) ]

    print('\n'.join(contents))

    
Best,

   -Nikolaus

-- 
 ÂTime flies like an arrow, fruit flies like a Banana.Â

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]