Re: [Tracker] Tracker and sandboxed applications



On Fri, 2016-01-29 at 11:33 +0100, Sam Thursfield wrote:

Hi Sam,

We had a discussion at the 2016 Developer Experience hackfest related
to sandboxing apps that use Tracker.

Sounds interesting! I heard of several companies in automotive and TV
setup box industries using Tracker in lightweight containers.

I wonder if there is shared interest in development of this kind of
use-cases and Tracker-adaptations between the GNOME desktop and
embedded? Sounds to me like there is. I mostly wonder who will do the
work.

For embedded what these companies want to do is partition the data. By
that I mean that they want certain kinds of data to be available to
certain applications, but certain other kinds of data not.

Often this has a legal reason, among other reasons (security, but also
business reason - 'we don't want apps to get our precious navigation
data', 'we don't want to throw your privacy on the public streets', 'we
own the data we collect on you, your apps don't', etc).

Some of these business reasons are nefarious or plain evil, others are
legit and yet others are a matter of physical security (you can die if
we would allow a 3th party to write to some location).

xdg-app is already at a point where people can (and are!) using it as
an build and distribution system for apps. The sandboxing aspect is
less developed, but that will be the next step.

ok

So how do we package Gnome Documents and Gnome Music for xdg-app? The
key decision is whether to continue having one global Tracker database
in the user's home dir -- in which case tracker-store has to be rock
solid at enforcing permissions and separation -- or whether we run the
Tracker code directly within that app's sandbox.

I think we should or could consider thinking of libtracker-sparql and
tracker-store as one and the same, and that applications or domains of
applications need to ship their package with a dependency on  a)
libtracker-sparql (+tracker-store) and b) on a ontology that belongs to
the domain.

Then libtracker-sparql's (a) init APIs should or could be adapted to
allow passing this ontology (b) (plus a uuid for having multiple
instances with the same ontology (b)) as a parameter.

libtracker-sparql and tracker-store are already for the most part
designed to do this. We just never got to actually doing it for real: we
only ship with our Nepomuk ontology, and the tracker-store process is
also a singleton-like process: you can't run multiple of them right now.

But there's no technical reason except the fixed location of the sqlite
meta.db file, for that.

And there's no reason that we could adapt tracker-store's D-Bus service
activation to allow for a tracker-store instance per installed ontology
plus uuid registration.

After that you partition them the way you as an integrator want to.

We decided on the latter approach for the time being: adapt Tracker
into a library that can be used by sandboxed apps to provide mining,
monitoring and query functionality for whatever directories that app
can see.

Mining is separate from querying. The miner is just another application
that insert data into the SPARQL endpoint (= libtracker-sparql plus
tracker-store). It plays by the exact same rules as any other at this
moment already.

That's also why there are libraries for writing your own miners. Because
you just can do that (have multiple miners).

So if you launch a miner for a, and a miner for b: the SPARQL endpoint
doesn't mind.

Splitting Tracker up a bit was a goal anyway, but the main
thing is that this way, tracker-store's query parser doesn't become a
security sensitive component.

Although splitting Tracker up has been a mentioned 'goal' (rather, one
of the many ideas we had in the past), I don't think splitting up the
project politically is necessarily a good idea. Although I think in the
long run should ontologies probably be maintained outside of the Tracker
project.

 I would hope we can do this in a way
that doesn't break existing use cases of Tracker. (Although I don't
know how everyone on this list is using it: feedback is helpful1)

A split up tracker-store + libtracker-sparql packaged separately from
tracker-miner-fs and both of them depending on a Nepomuk ontology
package, wouldn't break anything if the default constructor of
TrackerSparqlConnection passes the uuid and path of the installed
Nepomuk ontology package.

There are two downsides. One is that search + query across all the
user's data becomes more difficult, because it's no longer
centralised. But is still *possible* to do this: you just need to
synchronise the data from each app's database into a global database,
and then run the search/query on that database. RDF is an interchange
format, so synchronisation should be pretty easy to implement!

Ideally we someday end up allowing to do distributed queries:

        For example a SPARQL subquery where the inner query is executed on
instance A, and the outer query is executed on instance B, with
instances A and B sharing the minimal amount of data for the outer query
executer to solve the problem correctly.

        That's not easy.

Such a solution could also someday be made to work between instances
running on different computers. I wonder what a Facebook thinks of that.

Note that gnome-shell doesn't use Tracker to provide search results
anyway, it federates queries across applications (some of which then
use Tracker). That has the downside that a single keypress can trigger
thousands of context switches as each app updates the search results
it's providing... but that's a problem we already have, not something
that this proposal really makes worse.

To be honest I personally don't consider gnome-shell to be the main
use-case for this, rather the automotive and other embedded industries.
As they want data isolation and partitioning.

But for the tablet and smartphone industries (the app store world) I see
a huge benefit in having all "apps" run in lightweight containers: it
could run software that don't need to be trusted. With Tracker
integrated into that concept, we could allow those application to get
metadata about the world outside of their container in a read only way 

(Because it's partitioned in containers they can't encrypt the music on
your car and make you pay Bitcoins to get the encryption key, but they
can get a list of songs and their full metadata nonetheless - and they
could get other services to start playing them, for example).

The other downside is that there can be some duplication of work. E.g.
Videos and Music may both index the Music directory, and would both
end up monitoring the same files (perhaps many of them!).

One miner gets configured to mine *.mp3, *wav, *.ogg, and another
instance gets configured to to mine *.odf, *.doc, etc. The only overhead
is having to run two processes instead of one (= no big deal).

I don't think this is going to happen that often though, and it feels like
partly a design problem anyway. If there is a legitimate reason to
have 10 different apps that all need to monitor a user's entire music
collection then we can look at setting up an xdg-app portal that deals
with scanning all of the music collection and providing that info the
sandbox, but I struggle to think of many cases where we'd need that.
Contacts are one case, perhaps, but GNOME seems to prefer using
evolution-dataserver for those in any case...

Evolution data server is for contacts and calendar items only. It can't
do queries like: give me all the song titles that an artist played
during the rock festival which I described in this calendar item. If you
enter the data normally, Tracker can.

If you look at a Facebook's use-cases; that's also what people want.

Let me know what you think.


Kind regards,

Philip

PS. here are some relevant previous discussions:
https://mail.gnome.org/archives/tracker-list/2015-March/msg00015.html
https://mail.gnome.org/archives/tracker-list/2014-September/msg00030.html
https://mail.gnome.org/archives/tracker-list/2014-September/msg00030.html

Attachment: signature.asc
Description: This is a digitally signed message part



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]