Re: [Tracker] The Utopian idea, Tracker as it should be



On 17/09/14 13:16, Philip Van Hoof wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

So on IRC ssam2 and martyn were discussing this and then asking me if
what they concluded was something I agreed with. So one more time :-)

My idea is this:

The SPARQL Endpoint
- -------------------

o. libtracker-sparql and tracker-store get merged together. Perhaps we
    rename libtracker-sparql, perhaps not, perhaps it doesn't matter.

o. Instances of tracker-store become managed by libtracker-sparql
    (through D-Bus service activation or not, it's an implementation
    detail of libtracker-sparql either way)

o. Nepomuk becomes an upstream project, managed separately

o. Applications that need to deal with metadata will depend on Nepmuk
    (managed separately) and libtracker-sparql. Just like how they could
    if they'd use SQLite depend on libsqlite and on their own DB schema

So far so good. However, I would like opinion from Jürg before we dismantle + lift + shift code into libtracker-sparql, because actually what this means is, libtracker-sparql becomes:

  libtracker-common
  libtracker-data
  libtracker-bus
  libtracker-direct
  libtracker-sparql-backend
  libtracker-sparql
  tracker-store

ALL in one git repository?

That's probably not such a bad thing and really just a packaging difference in the end (for the most part).

The mining of metadata
- ----------------------

o. The 'Tracker' project will contain only miners

Actually, I think the 'Tracker' project should be all of the above for libtracker-sparql with a binary command line 'tracker' used to communicate with the DB on the most basic (command line) level.

The main reason for this is that people are used to running tracker-* commands already and it will be an easier cross over to keep 'tracker' as the official name.

I looked at the git source and they have this kind of structure, mind you they don't really have separate projects either.

o. The miners will depend on Nepomuk (managed separately) and
    libtracker-sparql (like any other application) and on tracker-extract

Makes sense to depend on a Nepomuk project for the ontology indeed.

BUT tracker-extract technically is a miner, so having other miners depend on it doesn't really make sense here. It's also optional, if you only want file data and basic RDF type data (e.g. nfo:Audio), you shouldn't need tracker-extract.

o. tracker-extract gets a public API (DBus FD passing based) that isn't
    deeply coupled with tracker-miner-fs

It isn't right now.

You can index your content without tracker-extract running and then a week later decide to extract more information and tracker-extract will populate the rest using extractors. It's coupled to GRAPH UPDATED and the ontology mainly. Carlos correct me if I am wrong.

o. tracker-extract therefore becomes a separate project (applications
    that want to use it can depend on it, without having to depend on
    Tracker's other miners). It deals with metadata so it too depends on
    libtracker-sparql and on Nepomuk (managed separately) (like any
    other application)

I think this makes sense too. We provide this functionality on the command line (i.e. displaying what we know about a file).

I wonder how applications would "use it" - I guess GNOME documents could decide to get SPARQL from foo.pdf to insert it themselves if they wanted, but that's really what tracker-extract does already - I don't see the added value here. We've not done this before and we've not had requests from people to do this either. I would rather add this sort of thing later if someone wants it bad enough.

Having said all that, for the external-crawler work I recently did (where external data sources push information through the libtracker-miner stack to be indexed) could benefit from reusing the extractor work.

I hasten to add, the ontology is usually most closely related to this area of the code and where we see the most inconsistencies or bugs due to broken ontology use.

o. tracker-miner-fs accepts (in implementation) that others can provide
    metadata (by integrating with tracker-extract or not) and that it
    should not interfere (this is already somewhat in place by using the
    graph support - our insert-or-replace sentence already only replaces
    in our own miner-fs' graph only, giving precedence to other graphs)
    It deals with metadata so it too depends on libtracker-sparql and on
    Nepomuk (managed separately) (like any other application)

Yea I agree.

Why, my god, whyyy?
- -------------------

libtracker-sparql + tracker-store: Allowing multiple ontologies to be
   used. Applications don't care about tracker-store. They just want an
   API to launch their SPARQL and SPARQL INSERT queries on (and that's
   really it). They also want GraphUpdated, which is problematic as this
   would need to be separate per ontology too (fair enough).

I agree, and with any luck this should make sandboxing or isolating ontology testing or data sets much easier.

tracker-extract separate: Allowing MTP daemons to enrich metadata
   themselves on a file in /tmp before doing the rename() to the final
   destination in $HOME. Allowing them to control the metadata insertion
   instead of letting inotify of tracker-miner-fs picking up the file
   after rename (metadata upfront the file being ready). To indicate
   that the file isn't ready we have tracker:available property.

You should know, this is already possible and I know of real use cases doing this too.

Nepomuk separate: Sharing the ontology with KDE desktop, without
   GNOME's politics interfering of trying to dominate needlessly the
   processes (which, whether GNOME people like this or not, would imply
   that KDE simply wouldn't use it). Where this gets hosted? FDO?
   nepomuk-desktop.org? Jesus, I don't care.

I'm all for sharing, but our situation has always been slightly different, we have a lot of extensions and things which the original ontology doesn't have, so we can't strictly follow it anyway. I don't know how this will sit with the KDE folk if they want to use Tracker's ontology.

   I also don't only care about the desktops. There are industries like
   Automotive, File Sharing solutions, Digital setup boxes for digital
   TV and portable harddisks using Tracker right now. They all want to
   influence the ontologies. A consortium that is more neutral than "the
   Linux Desktop world" is needed. Being wrong is bad, as it's always an
   API change (all depending projects need a major version increment and
   things start breaking at the query level).

I agree.

   So we outsource this to a more competent team who care about more
   than like we do about just our implementation of it.

Usually changes to the ontology are closely related to tracker-extract and extended metadata OR updates in the spec. I don't think we need to outsource this, we just make it a separate project and let people get involved, like we did with libmediaart.

FAQ
- ---

Q: Would this be a split of the project?
A: Yes, I guess. In four parts (Nepomuk, libtracker-sparql, tracker-
    extract, tracker(-miners))

Maybe it's boring, but I would just have a 'tracker-data-miners' project and have in it:

  libtracker-miner
  libtracker-extract
  tracker-miner-fs
  tracker-miner-apps
  tracker-miner-user-guides
  tracker-miner-rss
  tracker-extract

The hard part is, a lot of those components depend on libtracker-common, a private library and what would likely be part of the 'libtracker-sparql' project.

Q: Does it matter?
A: No, I guess (we'll still love each other on #tracker and
    tracker-list. Same maintainers, same overall project, same goals)

Actually I prefer this approach, the pace of development is so different in different areas of the code base that this would make things simpler for me as a maintainer.

Q: Are you dangerous? Splitting is bad! bad bad bad! You witch!
A: Very

No it isn't.

Actually, smaller modular based approaches work much better in general because they have a specific purpose.

The smaller the component the smaller the risk of massive catastrophe in any circumstance.

There might be some version turbulence but other than that :)

Q: But I use tracker-store's Resources DBus API to do Query and Insert.
    And you want me to use libtracker-sparql then, rigth? Will that make
    my beloved DBus API on tracker-store disappear?
A: Yes. You should never have used that one in the first place. The
    only public API that we should support is libtracker-sparql (and
    GraphUpdated on Resources, but I guess we'll bring that as a signal
    on the TrackerSparqlConnection to libtracker-sparql too)

Not sure I agree with this point.
But it's not insurmountable.

--
Regards,
Martyn

Founder & Director @ Lanedo GmbH.
http://www.linkedin.com/in/martynrussell


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]