Re: [Tracker] Database access abstraction

From: Philip Van Hoof <spam pvanhoof be>
To: jamie mccrack gmail com
Cc: Tracker-List <tracker-list gnome org>
Subject: Re: [Tracker] Database access abstraction
Date: Thu, 06 Nov 2008 11:29:08 +0100


Hi, this is a mail that I once sent to a few people at Nokia who wanted
direct access to SQL too:

Hi guys,

I made a very simple test case that selects a LIMIT of 1 until 100 of
a few columns out of the Services table.

After I finished the "over DBus" version of it, and while I was
measuring its performance, I was already confident that as konttori
pointed out too, that the DBus overhead truly is minimal when compared
to query time.

I could write the same test with direct access and it would most
likely shave of another few tenths of milliseconds. But for a UI
application I don't really see the point in that (a mainloop iteration
that has to do a few exposes and draws is likely going to take
longer).

So I attached a vala app for testing this and I included the generated
C source code for it. `pkg-config dbus-glib-1 --cflags --libs` it and
for the Vala stuff take a look here:
http://live.gnome.org/Vala/DBusSample

So this is over DBus:

        pvanhoof tinc:~/test$ ./test-sql-tracker 
        
        ...
       
        0.035126 seconds elapsed
        pvanhoof tinc:~/test$


Now with relation to the queuing: 

SQLite has a write-lock per transaction and it keeps all tables in the
connection involved in the transaction, locked. A write lock means
that while we (Tracker) are writing, you (your process, your
connection, your *direct* connection to the SQLite tables indeed) is
locked out.

Tracker writes in long transactions because SQLite is 50 or 60 times
faster at writing if you group lots of writes together. If you don't
do this you also have the fsync() problem more often. This is a
similar problem as what Firefox started having when it switched to
SQLite for several things.

Short: This means that we can't turn off our use of transactions. It's
vital in Tracker's design and performance.

For preempting these transactions we require a very strict
communication between the indexer and the front end query mechanism.
This implies a synchronous DBus message to the indexer. This DBus
message will instruct the indexer to do a preemptive commit of its
standing transaction.

This is possible because the transaction is not done for atomicity but
is only done for improving the write speed of SQLite (so we can early
commit it, and we do that).

So even if we'd make a library that has a direct (in process with the
app linking with that library) connection to the SQLite database, then
we'd still, just like what trackerd (the front end query mechanism)
has to do too, would need to execute a DBus message to request a
preemptive commit to the indexer. Making it pointless to do a direct
connection in the first place (because you still have the DBus
overhead for this one message, anyway).


Some pointers:

http://www.sqlite.org/faq.html#q5
http://www.sqlite.org/lang_transaction.html


What would be possible to avoid queuing, in case a lot of applications
concurrently want to query Tracker, is to introduce a connection pool
and a queue for each connection in the pool, to Tracker's query front
end.

I must warn that although very much possible, that this solution adds
complexity. If it's not proved that concurrent access will very often
occur, I don't think that it's worthwhile right now to implement this
solution already.

But we can keep it in mind for when the day comes.

/EO my advise ;)







On Tue, 2008-10-21 at 08:47 -0400, Jamie McCracken wrote:

On Tue, 2008-10-21 at 09:36 +0100, Martyn Russell wrote:

Also in the future i want to support direct access to sqlite via  a
client lib so we can bypass dbus (and trackerd) for select queries where
speed is paramount and volume of data is too big for dbus to handle
optimally (think get all my 100,000 music tracks with metadata). So this
library would have to handle all querying and any future ones (like
sparql) - so you will have no problem from me for implementing that
support in a lib


Hmm, I would like to see the difference it makes using DBus and if it
really is an issue. We have an API like this in DBus now which Phillip
added - I really don't like the idea of people executing random SQL on
the databases. It can lead to much bigger problems. Phillip stresses
this in the .xml file where we document this API. I think quite rightly
so too.


that should be moved to a direct access lib

the advantage of a direcdt access lib is that it removes dbus overhead
when large amounts of data are required.

Rob taylor probably knows this better but from what i understand dbus is
not optimal for large payloads (~1mb+) and somehting like get all music
and metadata might involve a million plus strings which dbus would have
to marshall, strdup and validate individually into multiple packets
(IIRC packet size is 4kb?) so you are looking at massive overhead with
multiple ipc calls



_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list

-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be

Attachment: test-sql-tracker.vala
Description: Text Data

Follow-Ups:
- Re: [Tracker] Database access abstraction
  - From: Jamie McCracken

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]