Re: [Tracker] Using tracker extractors from other applications



Ivan Frade <ivan frade-Re5JQEeQqe8AvxtiuMwx3w public gmane org> writes:
Hi,

On Sat, Nov 20, 2010 at 12:21 AM, Nikolaus Rath <Nikolaus-BTH8mxji4b0 public gmane org> wrote:


Nikolaus Rath <Nikolaus-BTH8mxji4b0-XMD5yJDbdMReXY1tMh2IBg public gmane org> writes:
extractor = ExtractorHelper ()
results = extractor.get_metadata (filename)

Upon closer investigation, get_metadata() fails whenever it encounters a
text/plain file that contains a '['. Looking at the code, this does not
seem surprising.

Is the format of the string that's returned by GetMetadata() described
somewhere? Then I could try to fix the parser.


GetMetadata() returns triplets in "turtle" format, with the subject missing
(because the caller should know it and probably wants to add more
information). That python "parser" (if you can call it that) uses just
regular expressions to parse those triplets and handle the anonymous nodes
(those "[ xxx ]") in a tricky way to form a single key for the dictionary.

Nodes like:
A slo:location [a slo:GeoLocation; slo:city "Helsinki"]
Are translated in the dictionary to:
slo:location:city "Helsinki"

Not nice, but good enough for our testing. Remember that this code is just
an internal utility and not a public API. Patches are welcome if you find
issues,

Well, I would be quite happy to submit patches. But at the moment I
still have absolutely no idea what GetMetadata() returns. What is a
"turtle format"? What is a "subject"? What defines a node? I think you
are assuming that I know something which I actually don't..


Btw, if tracker itself does not use ExtractorHelper, how does e.g.
tracker-miner make sense of the metadata? Maybe I could use that code as
a start instead. I tried to follow the invocation of
get_metadata_fast_async but could not really identify the part where the
metadata is parsed.


Thanks,

   -Nikolaus

-- 
 ÂTime flies like an arrow, fruit flies like a Banana.Â

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]