Re: [Tracker] Database access abstraction



its not that clear cut

firstly, pausing the indexer is an async process and does not have to be
synchronous at all so there is no round trip with direct access. In most
cases the indexer will be idle when querying. A client could even bypass
by checking to see if tracker-indexer is running

dbus has overhead with millions of strings - each one has to be type
verified and copied several times. Unless you are testing with huge
result sets like fetching 100,000 music files with all metadata then any
comparison is invalid. Also as we move to flattened tables the query
time will get faster and the this will make the dbus overhead more
prominent
ï
In other cases, like say using tracker as a gconf backend you will often
get calls to fetch individual keys and these need to be in process to
avoid round trips for each key

ergo its premature to rule out direct access at this point 


jamie


On Thu, 2008-11-06 at 11:29 +0100, Philip Van Hoof wrote:
Hi, this is a mail that I once sent to a few people at Nokia who wanted
direct access to SQL too:


Hi guys,

I made a very simple test case that selects a LIMIT of 1 until 100 of
a few columns out of the Services table.

After I finished the "over DBus" version of it, and while I was
measuring its performance, I was already confident that as konttori
pointed out too, that the DBus overhead truly is minimal when compared
to query time.

I could write the same test with direct access and it would most
likely shave of another few tenths of milliseconds. But for a UI
application I don't really see the point in that (a mainloop iteration
that has to do a few exposes and draws is likely going to take
longer).

So I attached a vala app for testing this and I included the generated
C source code for it. `pkg-config dbus-glib-1 --cflags --libs` it and
for the Vala stuff take a look here:
http://live.gnome.org/Vala/DBusSample

So this is over DBus:

        pvanhoof tinc:~/test$ ./test-sql-tracker 
        
        ...
       
        0.035126 seconds elapsed
        pvanhoof tinc:~/test$


Now with relation to the queuing: 

SQLite has a write-lock per transaction and it keeps all tables in the
connection involved in the transaction, locked. A write lock means
that while we (Tracker) are writing, you (your process, your
connection, your *direct* connection to the SQLite tables indeed) is
locked out.

Tracker writes in long transactions because SQLite is 50 or 60 times
faster at writing if you group lots of writes together. If you don't
do this you also have the fsync() problem more often. This is a
similar problem as what Firefox started having when it switched to
SQLite for several things.

Short: This means that we can't turn off our use of transactions. It's
vital in Tracker's design and performance.

For preempting these transactions we require a very strict
communication between the indexer and the front end query mechanism.
This implies a synchronous DBus message to the indexer. This DBus
message will instruct the indexer to do a preemptive commit of its
standing transaction.

This is possible because the transaction is not done for atomicity but
is only done for improving the write speed of SQLite (so we can early
commit it, and we do that).

So even if we'd make a library that has a direct (in process with the
app linking with that library) connection to the SQLite database, then
we'd still, just like what trackerd (the front end query mechanism)
has to do too, would need to execute a DBus message to request a
preemptive commit to the indexer. Making it pointless to do a direct
connection in the first place (because you still have the DBus
overhead for this one message, anyway).


Some pointers:

http://www.sqlite.org/faq.html#q5
http://www.sqlite.org/lang_transaction.html


What would be possible to avoid queuing, in case a lot of applications
concurrently want to query Tracker, is to introduce a connection pool
and a queue for each connection in the pool, to Tracker's query front
end.

I must warn that although very much possible, that this solution adds
complexity. If it's not proved that concurrent access will very often
occur, I don't think that it's worthwhile right now to implement this
solution already.

But we can keep it in mind for when the day comes.

/EO my advise ;)






On Tue, 2008-10-21 at 08:47 -0400, Jamie McCracken wrote:
On Tue, 2008-10-21 at 09:36 +0100, Martyn Russell wrote:


Also in the future i want to support direct access to sqlite via  a
client lib so we can bypass dbus (and trackerd) for select queries where
speed is paramount and volume of data is too big for dbus to handle
optimally (think get all my 100,000 music tracks with metadata). So this
library would have to handle all querying and any future ones (like
sparql) - so you will have no problem from me for implementing that
support in a lib

Hmm, I would like to see the difference it makes using DBus and if it
really is an issue. We have an API like this in DBus now which Phillip
added - I really don't like the idea of people executing random SQL on
the databases. It can lead to much bigger problems. Phillip stresses
this in the .xml file where we document this API. I think quite rightly
so too.  

that should be moved to a direct access lib

the advantage of a direcdt access lib is that it removes dbus overhead
when large amounts of data are required.

Rob taylor probably knows this better but from what i understand dbus is
not optimal for large payloads (~1mb+) and somehting like get all music
and metadata might involve a million plus strings which dbus would have
to marshall, strdup and validate individually into multiple packets
(IIRC packet size is 4kb?) so you are looking at massive overhead with
multiple ipc calls



_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]