On Fri, 2016-01-29 at 11:33 +0100, Sam Thursfield wrote: Hi Sam,
We had a discussion at the 2016 Developer Experience hackfest related to sandboxing apps that use Tracker.
Sounds interesting! I heard of several companies in automotive and TV setup box industries using Tracker in lightweight containers. I wonder if there is shared interest in development of this kind of use-cases and Tracker-adaptations between the GNOME desktop and embedded? Sounds to me like there is. I mostly wonder who will do the work. For embedded what these companies want to do is partition the data. By that I mean that they want certain kinds of data to be available to certain applications, but certain other kinds of data not. Often this has a legal reason, among other reasons (security, but also business reason - 'we don't want apps to get our precious navigation data', 'we don't want to throw your privacy on the public streets', 'we own the data we collect on you, your apps don't', etc). Some of these business reasons are nefarious or plain evil, others are legit and yet others are a matter of physical security (you can die if we would allow a 3th party to write to some location).
xdg-app is already at a point where people can (and are!) using it as an build and distribution system for apps. The sandboxing aspect is less developed, but that will be the next step.
ok
So how do we package Gnome Documents and Gnome Music for xdg-app? The key decision is whether to continue having one global Tracker database in the user's home dir -- in which case tracker-store has to be rock solid at enforcing permissions and separation -- or whether we run the Tracker code directly within that app's sandbox.
I think we should or could consider thinking of libtracker-sparql and tracker-store as one and the same, and that applications or domains of applications need to ship their package with a dependency on a) libtracker-sparql (+tracker-store) and b) on a ontology that belongs to the domain. Then libtracker-sparql's (a) init APIs should or could be adapted to allow passing this ontology (b) (plus a uuid for having multiple instances with the same ontology (b)) as a parameter. libtracker-sparql and tracker-store are already for the most part designed to do this. We just never got to actually doing it for real: we only ship with our Nepomuk ontology, and the tracker-store process is also a singleton-like process: you can't run multiple of them right now. But there's no technical reason except the fixed location of the sqlite meta.db file, for that. And there's no reason that we could adapt tracker-store's D-Bus service activation to allow for a tracker-store instance per installed ontology plus uuid registration. After that you partition them the way you as an integrator want to.
We decided on the latter approach for the time being: adapt Tracker into a library that can be used by sandboxed apps to provide mining, monitoring and query functionality for whatever directories that app can see.
Mining is separate from querying. The miner is just another application that insert data into the SPARQL endpoint (= libtracker-sparql plus tracker-store). It plays by the exact same rules as any other at this moment already. That's also why there are libraries for writing your own miners. Because you just can do that (have multiple miners). So if you launch a miner for a, and a miner for b: the SPARQL endpoint doesn't mind.
Splitting Tracker up a bit was a goal anyway, but the main thing is that this way, tracker-store's query parser doesn't become a security sensitive component.
Although splitting Tracker up has been a mentioned 'goal' (rather, one of the many ideas we had in the past), I don't think splitting up the project politically is necessarily a good idea. Although I think in the long run should ontologies probably be maintained outside of the Tracker project.
I would hope we can do this in a way that doesn't break existing use cases of Tracker. (Although I don't know how everyone on this list is using it: feedback is helpful1)
A split up tracker-store + libtracker-sparql packaged separately from tracker-miner-fs and both of them depending on a Nepomuk ontology package, wouldn't break anything if the default constructor of TrackerSparqlConnection passes the uuid and path of the installed Nepomuk ontology package.
There are two downsides. One is that search + query across all the user's data becomes more difficult, because it's no longer centralised. But is still *possible* to do this: you just need to synchronise the data from each app's database into a global database, and then run the search/query on that database. RDF is an interchange format, so synchronisation should be pretty easy to implement!
Ideally we someday end up allowing to do distributed queries: For example a SPARQL subquery where the inner query is executed on instance A, and the outer query is executed on instance B, with instances A and B sharing the minimal amount of data for the outer query executer to solve the problem correctly. That's not easy. Such a solution could also someday be made to work between instances running on different computers. I wonder what a Facebook thinks of that.
Note that gnome-shell doesn't use Tracker to provide search results anyway, it federates queries across applications (some of which then use Tracker). That has the downside that a single keypress can trigger thousands of context switches as each app updates the search results it's providing... but that's a problem we already have, not something that this proposal really makes worse.
To be honest I personally don't consider gnome-shell to be the main use-case for this, rather the automotive and other embedded industries. As they want data isolation and partitioning. But for the tablet and smartphone industries (the app store world) I see a huge benefit in having all "apps" run in lightweight containers: it could run software that don't need to be trusted. With Tracker integrated into that concept, we could allow those application to get metadata about the world outside of their container in a read only way (Because it's partitioned in containers they can't encrypt the music on your car and make you pay Bitcoins to get the encryption key, but they can get a list of songs and their full metadata nonetheless - and they could get other services to start playing them, for example).
The other downside is that there can be some duplication of work. E.g. Videos and Music may both index the Music directory, and would both end up monitoring the same files (perhaps many of them!).
One miner gets configured to mine *.mp3, *wav, *.ogg, and another instance gets configured to to mine *.odf, *.doc, etc. The only overhead is having to run two processes instead of one (= no big deal).
I don't think this is going to happen that often though, and it feels like partly a design problem anyway. If there is a legitimate reason to have 10 different apps that all need to monitor a user's entire music collection then we can look at setting up an xdg-app portal that deals with scanning all of the music collection and providing that info the sandbox, but I struggle to think of many cases where we'd need that. Contacts are one case, perhaps, but GNOME seems to prefer using evolution-dataserver for those in any case...
Evolution data server is for contacts and calendar items only. It can't do queries like: give me all the song titles that an artist played during the rock festival which I described in this calendar item. If you enter the data normally, Tracker can. If you look at a Facebook's use-cases; that's also what people want.
Let me know what you think.
Kind regards, Philip
PS. here are some relevant previous discussions: https://mail.gnome.org/archives/tracker-list/2015-March/msg00015.html https://mail.gnome.org/archives/tracker-list/2014-September/msg00030.html https://mail.gnome.org/archives/tracker-list/2014-September/msg00030.html
Attachment:
signature.asc
Description: This is a digitally signed message part