Re: [Tracker] New branch: dbus-fd-experiment



On 28 May 2010 13:32, Adrien Bustany <abustany gnome org> wrote:
Le Fri, 28 May 2010 09:45:33 +0200,
Mikkel Kamstrup Erlandsen <mikkel kamstrup gmail com> a écrit :

On 27 May 2010 17:08, Adrien Bustany <abustany gnome org> wrote:
Hello list!

You might have heard of the dbus-fd-experiment branch.

What is this branch about? I've been looking lately at how we
can improve our use of D-Bus, by not using it for passing large
amounts of data.

D-Bus isn't slow when used it to pass small messages, but its
performance goes
down when it has to handle large amounts of data.

The dbus-fd-experiment takes advantage of a new feature present in
D-Bus 1.3 that allows passing UNIX file descriptors as message
parameters. Using this feature, we create a pipe on the server then
pass it to the client.

Then we send the results over the pipe, saving the costs of D-Bus
marshalling.
The protocol used to pass data over the pipe is described in the
reports [1] and [2].

It's designed to minimize marshalling costs and context switches
between client
and server (for optimal performance).

I integrated this in tracker-store and libtracker-client, and the
results are
pretty good.

** Give me the numbers! **
     | Normal DBus   | DBus + FD passing ("steroids") | Relative
speedup Query 1 | 38 ms         | 28 ms                          |
25% Query 2 | 142 ms        | 91 ms                          | 57%
Query 3 | 8 ms          | 7 ms                           | 14%
Query 4 | 449 ms        | 212 ms                         | 112%

Queries:

1: select ?a nmm:artistName(?a) where {?a a nmm:Artist}
 332 rows
 18874 bytes
2: select ?t nie:contentLastModified(?t) where {?t a
nfo:FileDataObject} 13542 rows
 654399 bytes
3: select ?r where {?r a nfo:FileDataObject; fts:match "test"}
 234 rows
 10764 bytes
4: select nie:plainTextContent(?f) where {?f a nfo:FileDataObject}
 231077 rows
 16184790 bytes

The tiny code I used to benchmark is hosted at [3].

** How it works under the hood **

My first approach was to use a client side iterator, and send the
results progressively from the server.

This approach is not good because while results are being sent from
the server to the client, a DB iterator is kept open in the store,
blocking concurrent INSERT queries.

Instead we fetch all the results in a buffer on client side (which
is a bit more expensive), and then iterate on that buffer. That
way, the DB sqlite3_stmt
on server side is released ASAP.

The code in tracker-store is in the two files
tracker-steroids.[ch]. There are
also a few lines in tracker-dbus.c to add a new DBus object,
/org/freedesktop/Tracker1/Steroids. This object has two methods in
the interface org.freedesktop.Tracker1.Steroids, which are
PrepareQuery and Fetch.

In libtracker-client I added a new query function,
tracker_resources_sparql_query_iterate. This function returns a
TrackerResultIterator which can be used with the
tracker_result_iterator_* functions. All those functions are
documented, and there is an example client
in the examples dir.

The work has been thoroughly tested during this week. I also wrote
unit tests
to ensure we get the same results when using both the traditional
and the "steroids" methods. GCov reports a complete coverage of the
code. You're of course invited to test it more, and report me any
problem :)


Just out of curiosity...

Have you tried just using peer-2-peer DBus instead of going over the
bus daemon? That saves one rountrip over the DBus socket right
there... And if I recall correctly data will not be validated either
when using p2p dbus, but I may be wrong on that one...

But very interesting work.


Hello Mikkel

That is an interesting suggestion, I didn't try p2p DBus so far.
Anyway, using p2p DBus, I think we still get both the marshaling
costs, and the fact that all data is sent in one message right ? The
fact that data wouldn't be validated is interesting though, since in my
tests utf8 validation was really taking a lot of time with DBus.
I'll have a look at p2p DBus and see if it's good for us.

Yeah, you still definitely loose the streaming benefits (unless you
talk directly to the DBus socket yourself). The marshalling cost
depends on the lib you use I think. Using GVariants and GDBus should
incur very low overhead (because GVariant basically *is* the DBus wire
format) - if I understand correctly - which I very well may not :-)

-- 
Cheers,
Mikkel



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]