Re: Finding and Reminding, tech issues, 3.0 and beyond

On Sat, 2010-04-10 at 17:10 -0400, Owen Taylor wrote:

> The reason I consider storage relevant is that throwing data into "an
> optimized SQL database" where you don't have any ability to control what
> is indexed or understanding how query plans are executed is usually a
> recipe for application performance disaster. There are many people who
> make an excellent living going in and fixing these sorts of application
> performance disasters.
> Now, to the extent that we're building GNOME and Tracker together as a
> system and we know what queries we need make fast - what standard
> properties need to be indexed - we're OK. For our "Finding and
> Reminding" plans I don't see a problem.
> But if we go beyond that and start encouraging people to start putting
> all sorts of application data into Tracker and relying on Tracker to do
> efficient queries on it, then it definitely is a concern. Based on how
> Tracker is mapping RDF into SQL tables, some SPARQL queries are going to
> be fast, some are going to be dead slow and people need to be able to
> come to an understanding of which are which.

This is indeed true but one of the goals of our RDF implementation was
that there would not be a significant performance difference between
using a custom sqlite database and trackers DB.

If there are significant performance issues then we would need to be
notified of them and hopefully they can be resolved at the tracker end
(or even better at the ontology layer as we have custom properties to
indicate if a field should be indexed in sqlite)

> What do you see as the distinction between between "general event
> logging" and "timeline info"?

Timeline to me means audit info like when a file was opened or a music
track played - this is the kind of stuff that tracker would want to
support natively 

Event logging could be anything and AFAIK zeitgeist plans to track
things like application focusing and probably stuff that the old Beagle
based Dashboard did. 

I really see zeitgeist as providing info like show me "all related
documents to this document" where it could draw upon contextual stuff
like Doc X was viewed whilst Doc Y was viewed and infer a relationship
between the two items. Tracker could only relate stuff where user
explicitly set such links (like via a common tag). This is the key
difference between zeitgeist and tracker as I understand it 

> This kind of "we can do some sorts of event logging, but other event
> logging is Zeitgeist" distinction is, generally speaking, Not Helpful. 
> Everybody needs to have a solid idea of what the components are and how
> they fit together.

Well the problem is that metadata in tracker is generally persistent and
has indefinite lifespan until changed by the user. Event logging is at
best semi-persistent in that as it ages its usefulness decreases and
should ultimately be discarded when its too old to be useful. I can see
audit data as having a long enough lifespan to add it to tracker
> I certainly do have a concern that if some data is stored in an event
> log that Zeitgeist maintains and some data is stored in the tracker
> database that queries will be inefficient. If displaying a list of the
> last 100 files tagged as "download" by time requires getting a list of
> all files tagged as "download", querying Zeitgeist for all events on all
> those files, then sorting by time in the application, that potentially
> will suck.

I dont see a problem with storing such semi-persistent or temporary data
separately in zeitgeist provided your interest is centred around a
single URI. (IE queries like show me all related stuff to URI X). This
is perfectly manageable across tracker/zeitgeist. As you and I have both
said, its not manageable when you need it for large scale queries which
return multiple URIs but so far audit timestamps is the only case I know
of where thats common

> If in the future Zeitgeist is using Tracker as a backend and is
> primarily about *writing* information into Tracker, and applications can
> query that information directly through the Tracker API, then that
> problem does largely go away. If Tracker is only used behind the scenes
> for storage in an opaque way, that wouldn't help because the app would
> still have to fetch two separate sorts of information and integrate
> them. Even if they were under the hood coming from the same place.

As above it really depends how you intend to query stuff (small scale vs
large scale). In the end its more about convincing the zeitgeist team to
make full use of trackers storage abilities (something which they are
reluctant to do atm)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]