Re: [Tracker] Extracting the extractors



Thanks for the quick feedback!

You're right that I should have implemented Turtle output. I've done
that now, this is the result (as you'd expect):

<urn:artist:Best%20Coast> nmm:artistName "Best Coast" ;
  rdf:type nmm:Artist .

<urn:album:The%20Only%20Place> nmm:albumTitle "The Only Place" ;
  rdf:type nmm:MusicAlbum ;
  nmm:albumArtist <urn:artist:Best%20Coast> .

<urn:album-disc:The%20Only%20Place:Disc1> nmm:setNumber 1 ;
  nmm:albumDiscAlbum <urn:album:The%20Only%20Place> ;
  rdf:type nmm:MusicAlbumDisc .

<file:///home/sam/Downloads/Best%20Coast%20-%20The%20Only%20Place.mp3>
nie:comment "Free download from http://www.last.fm/music/Best+Coast
and http://MP3.com"; ;
  nmm:trackNumber 1 ;
  nmm:performer <urn:artist:Best%20Coast> ;
  nfo:averageBitrate 128000 ;
  nmm:musicAlbum <urn:album:The%20Only%20Place> ;
  nfo:channels 2 ;
  nmm:dlnaProfile "MP3" ;
  nmm:musicAlbumDisc <urn:album-disc:The%20Only%20Place:Disc1> ;
  rdf:type nmm:MusicPiece , nfo:Audio ;
  nfo:duration 164 ;
  nfo:codec "MPEG" ;
  nmm:dlnaMime "audio/mpeg" ;
  nfo:sampleRate 44100 ;
  nie:title "The Only Place" .


I'm still kinda interested in JSON-LD, because JSON (though not
JSON-LD) has such a massive user base already. Phillip, JSON-LD *is* a
W3C standard: <https://www.w3.org/TR/json-ld/>. The great thing about
standards is there are so many!

That said all the W3C's previous attempts at RDF-in-JSON are quite
bad, I think JSON-LD is definitely an improvement. There's a great
blog post from the main guy behind the standard called "JSON-LD and
Why I Hate the Semantic Web" which I recommend reading :-)
<http://manu.sporny.org/2014/json-ld-origins-2/>

Anyway, for my purposes, Turtle output from the extractors is fine
(and a big improvement on SPARQL). I'll keep the JSON-LD stuff around
in a separate commit.


On Sat, Apr 9, 2016 at 12:49 PM, Carlos Garnacho <carlosg gnome org> wrote:
Hey Sam :),

so, inspired by something in the Python RDFLib library, I came up with a
TrackerResource class that the extractors can use instead. This is a
work in process, but I have a branch in git.gnome.org that adds
TrackerResource, and converts some of the extractors to use it. The
TrackerResource class can serialize either to SPARQL update commands or
to JSON-LD. The branch also adds the `tracker extract` command from
<https://bugzilla.gnome.org/show_bug.cgi?id=751991> so you can try out
the extractors easily and specify `-o json` or `-o sparql` as you prefer.

Nice! Should it have a turtle serializer too? Do you think this can be
possibly used in the tracker store side to serialize contents?

I hadn't thought of that, but it's definitely possible. You could have
a `tracker serialize-the-whole-database` command :-)

In terms of backups, part of me things we should use an efficient
binary format.. but then it's hard to trust a backup that is an opaque
binary format. If we could serialize to Turtle or JSON-LD then you
could tell just by looking whether it was valid or not. We can just
gzip it to make it small.

...

Here's an example of auto-generated SPARQL for an MP3 extraction:

<snip>


Note there are a lot more DELETE statements than before. I figured that
anywhere we want to replace the existing data we need a DELETE
statement, and the reason we don't normally do it is because previously
it had to be done manually. That said, the TrackerResource class does
have a way of avoiding this. If you ever call _set_value() for a property then
it assumes you want to *overwrite* it, and will generate a DELETE. If you
only use _add_value() then it will assume you want to *add* to it, and won't
generate a DELETE. The latter case is needed for stuff like nao:hasTag.
I may be misunderstanding things here of course, I didn't actually write any
of the extractors myself.

Sounds good :), It seems to me that the generated sparql already
ensures some correctness, which is great. The difference between set
and add makes sense, given that we have to deal with single and
multivalued properties. The only potentially harmful combination would
be doing add_value() on a single valued property, is there any way
that could raise a warning in tracker-extract, rather than being
caught late due to the failed insert?

I don't think that's possible because libtracker-sparql doesn't have
any knowledge of the ontologies. We could move a bunch of code from
libtracker-data to libtracker-sparql to make it happen, but I actually
think it's a good design to have libtracker-sparql separate from
Tracker's own database and Tracker's own ontologies.

Sam


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]