Re: [Tracker] The Utopian idea, Tracker as it should be
- From: Martyn Russell <martyn lanedo com>
- To: Philip Van Hoof <philip codeminded be>, Tracker mailing list <tracker-list gnome org>
- Subject: Re: [Tracker] The Utopian idea, Tracker as it should be
- Date: Wed, 17 Sep 2014 16:05:12 +0100
On 17/09/14 13:16, Philip Van Hoof wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
So on IRC ssam2 and martyn were discussing this and then asking me if
what they concluded was something I agreed with. So one more time :-)
My idea is this:
The SPARQL Endpoint
- -------------------
o. libtracker-sparql and tracker-store get merged together. Perhaps we
rename libtracker-sparql, perhaps not, perhaps it doesn't matter.
o. Instances of tracker-store become managed by libtracker-sparql
(through D-Bus service activation or not, it's an implementation
detail of libtracker-sparql either way)
o. Nepomuk becomes an upstream project, managed separately
o. Applications that need to deal with metadata will depend on Nepmuk
(managed separately) and libtracker-sparql. Just like how they could
if they'd use SQLite depend on libsqlite and on their own DB schema
So far so good. However, I would like opinion from Jürg before we
dismantle + lift + shift code into libtracker-sparql, because actually
what this means is, libtracker-sparql becomes:
libtracker-common
libtracker-data
libtracker-bus
libtracker-direct
libtracker-sparql-backend
libtracker-sparql
tracker-store
ALL in one git repository?
That's probably not such a bad thing and really just a packaging
difference in the end (for the most part).
The mining of metadata
- ----------------------
o. The 'Tracker' project will contain only miners
Actually, I think the 'Tracker' project should be all of the above for
libtracker-sparql with a binary command line 'tracker' used to
communicate with the DB on the most basic (command line) level.
The main reason for this is that people are used to running tracker-*
commands already and it will be an easier cross over to keep 'tracker'
as the official name.
I looked at the git source and they have this kind of structure, mind
you they don't really have separate projects either.
o. The miners will depend on Nepomuk (managed separately) and
libtracker-sparql (like any other application) and on tracker-extract
Makes sense to depend on a Nepomuk project for the ontology indeed.
BUT tracker-extract technically is a miner, so having other miners
depend on it doesn't really make sense here. It's also optional, if you
only want file data and basic RDF type data (e.g. nfo:Audio), you
shouldn't need tracker-extract.
o. tracker-extract gets a public API (DBus FD passing based) that isn't
deeply coupled with tracker-miner-fs
It isn't right now.
You can index your content without tracker-extract running and then a
week later decide to extract more information and tracker-extract will
populate the rest using extractors. It's coupled to GRAPH UPDATED and
the ontology mainly. Carlos correct me if I am wrong.
o. tracker-extract therefore becomes a separate project (applications
that want to use it can depend on it, without having to depend on
Tracker's other miners). It deals with metadata so it too depends on
libtracker-sparql and on Nepomuk (managed separately) (like any
other application)
I think this makes sense too. We provide this functionality on the
command line (i.e. displaying what we know about a file).
I wonder how applications would "use it" - I guess GNOME documents could
decide to get SPARQL from foo.pdf to insert it themselves if they
wanted, but that's really what tracker-extract does already - I don't
see the added value here. We've not done this before and we've not had
requests from people to do this either. I would rather add this sort of
thing later if someone wants it bad enough.
Having said all that, for the external-crawler work I recently did
(where external data sources push information through the
libtracker-miner stack to be indexed) could benefit from reusing the
extractor work.
I hasten to add, the ontology is usually most closely related to this
area of the code and where we see the most inconsistencies or bugs due
to broken ontology use.
o. tracker-miner-fs accepts (in implementation) that others can provide
metadata (by integrating with tracker-extract or not) and that it
should not interfere (this is already somewhat in place by using the
graph support - our insert-or-replace sentence already only replaces
in our own miner-fs' graph only, giving precedence to other graphs)
It deals with metadata so it too depends on libtracker-sparql and on
Nepomuk (managed separately) (like any other application)
Yea I agree.
Why, my god, whyyy?
- -------------------
libtracker-sparql + tracker-store: Allowing multiple ontologies to be
used. Applications don't care about tracker-store. They just want an
API to launch their SPARQL and SPARQL INSERT queries on (and that's
really it). They also want GraphUpdated, which is problematic as this
would need to be separate per ontology too (fair enough).
I agree, and with any luck this should make sandboxing or isolating
ontology testing or data sets much easier.
tracker-extract separate: Allowing MTP daemons to enrich metadata
themselves on a file in /tmp before doing the rename() to the final
destination in $HOME. Allowing them to control the metadata insertion
instead of letting inotify of tracker-miner-fs picking up the file
after rename (metadata upfront the file being ready). To indicate
that the file isn't ready we have tracker:available property.
You should know, this is already possible and I know of real use cases
doing this too.
Nepomuk separate: Sharing the ontology with KDE desktop, without
GNOME's politics interfering of trying to dominate needlessly the
processes (which, whether GNOME people like this or not, would imply
that KDE simply wouldn't use it). Where this gets hosted? FDO?
nepomuk-desktop.org? Jesus, I don't care.
I'm all for sharing, but our situation has always been slightly
different, we have a lot of extensions and things which the original
ontology doesn't have, so we can't strictly follow it anyway. I don't
know how this will sit with the KDE folk if they want to use Tracker's
ontology.
I also don't only care about the desktops. There are industries like
Automotive, File Sharing solutions, Digital setup boxes for digital
TV and portable harddisks using Tracker right now. They all want to
influence the ontologies. A consortium that is more neutral than "the
Linux Desktop world" is needed. Being wrong is bad, as it's always an
API change (all depending projects need a major version increment and
things start breaking at the query level).
I agree.
So we outsource this to a more competent team who care about more
than like we do about just our implementation of it.
Usually changes to the ontology are closely related to tracker-extract
and extended metadata OR updates in the spec. I don't think we need to
outsource this, we just make it a separate project and let people get
involved, like we did with libmediaart.
FAQ
- ---
Q: Would this be a split of the project?
A: Yes, I guess. In four parts (Nepomuk, libtracker-sparql, tracker-
extract, tracker(-miners))
Maybe it's boring, but I would just have a 'tracker-data-miners' project
and have in it:
libtracker-miner
libtracker-extract
tracker-miner-fs
tracker-miner-apps
tracker-miner-user-guides
tracker-miner-rss
tracker-extract
The hard part is, a lot of those components depend on libtracker-common,
a private library and what would likely be part of the
'libtracker-sparql' project.
Q: Does it matter?
A: No, I guess (we'll still love each other on #tracker and
tracker-list. Same maintainers, same overall project, same goals)
Actually I prefer this approach, the pace of development is so different
in different areas of the code base that this would make things simpler
for me as a maintainer.
Q: Are you dangerous? Splitting is bad! bad bad bad! You witch!
A: Very
No it isn't.
Actually, smaller modular based approaches work much better in general
because they have a specific purpose.
The smaller the component the smaller the risk of massive catastrophe in
any circumstance.
There might be some version turbulence but other than that :)
Q: But I use tracker-store's Resources DBus API to do Query and Insert.
And you want me to use libtracker-sparql then, rigth? Will that make
my beloved DBus API on tracker-store disappear?
A: Yes. You should never have used that one in the first place. The
only public API that we should support is libtracker-sparql (and
GraphUpdated on Resources, but I guess we'll bring that as a signal
on the TrackerSparqlConnection to libtracker-sparql too)
Not sure I agree with this point.
But it's not insurmountable.
--
Regards,
Martyn
Founder & Director @ Lanedo GmbH.
http://www.linkedin.com/in/martynrussell
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]