Re: [Tracker] New branch: dbus-fd-experiment

From: Adrien Bustany <abustany gnome org>
To: Mikkel Kamstrup Erlandsen <mikkel kamstrup gmail com>
Cc: Tracker mailing list <tracker-list gnome org>
Subject: Re: [Tracker] New branch: dbus-fd-experiment
Date: Fri, 28 May 2010 09:46:18 -0400

Le Fri, 28 May 2010 14:10:32 +0200,
Mikkel Kamstrup Erlandsen <mikkel kamstrup gmail com> a écrit :

On 28 May 2010 13:32, Adrien Bustany <abustany gnome org> wrote:

Le Fri, 28 May 2010 09:45:33 +0200,
Mikkel Kamstrup Erlandsen <mikkel kamstrup gmail com> a écrit :

On 27 May 2010 17:08, Adrien Bustany <abustany gnome org> wrote:

Hello list!

You might have heard of the dbus-fd-experiment branch.

What is this branch about? I've been looking lately at how we
can improve our use of D-Bus, by not using it for passing large
amounts of data.

D-Bus isn't slow when used it to pass small messages, but its
performance goes
down when it has to handle large amounts of data.

The dbus-fd-experiment takes advantage of a new feature present
in D-Bus 1.3 that allows passing UNIX file descriptors as message
parameters. Using this feature, we create a pipe on the server
then pass it to the client.

Then we send the results over the pipe, saving the costs of D-Bus
marshalling.
The protocol used to pass data over the pipe is described in the
reports [1] and [2].

It's designed to minimize marshalling costs and context switches
between client
and server (for optimal performance).

I integrated this in tracker-store and libtracker-client, and the
results are
pretty good.

** Give me the numbers! **
     | Normal DBus   | DBus + FD passing ("steroids") | Relative
speedup Query 1 | 38 ms         | 28 ms
 | 25% Query 2 | 142 ms        | 91 ms
 | 57% Query 3 | 8 ms          | 7 ms
| 14% Query 4 | 449 ms        | 212 ms                         |
112%

Queries:

1: select ?a nmm:artistName(?a) where {?a a nmm:Artist}
 332 rows
 18874 bytes
2: select ?t nie:contentLastModified(?t) where {?t a
nfo:FileDataObject} 13542 rows
 654399 bytes
3: select ?r where {?r a nfo:FileDataObject; fts:match "test"}
 234 rows
 10764 bytes
4: select nie:plainTextContent(?f) where {?f a
nfo:FileDataObject} 231077 rows
 16184790 bytes

The tiny code I used to benchmark is hosted at [3].

** How it works under the hood **

My first approach was to use a client side iterator, and send the
results progressively from the server.

This approach is not good because while results are being sent
from the server to the client, a DB iterator is kept open in the
store, blocking concurrent INSERT queries.

Instead we fetch all the results in a buffer on client side
(which is a bit more expensive), and then iterate on that
buffer. That way, the DB sqlite3_stmt
on server side is released ASAP.

The code in tracker-store is in the two files
tracker-steroids.[ch]. There are
also a few lines in tracker-dbus.c to add a new DBus object,
/org/freedesktop/Tracker1/Steroids. This object has two methods
in the interface org.freedesktop.Tracker1.Steroids, which are
PrepareQuery and Fetch.

In libtracker-client I added a new query function,
tracker_resources_sparql_query_iterate. This function returns a
TrackerResultIterator which can be used with the
tracker_result_iterator_* functions. All those functions are
documented, and there is an example client
in the examples dir.

The work has been thoroughly tested during this week. I also
wrote unit tests
to ensure we get the same results when using both the traditional
and the "steroids" methods. GCov reports a complete coverage of
the code. You're of course invited to test it more, and report
me any problem :)


Just out of curiosity...

Have you tried just using peer-2-peer DBus instead of going over
the bus daemon? That saves one rountrip over the DBus socket right
there... And if I recall correctly data will not be validated
either when using p2p dbus, but I may be wrong on that one...

But very interesting work.


Hello Mikkel

That is an interesting suggestion, I didn't try p2p DBus so far.
Anyway, using p2p DBus, I think we still get both the marshaling
costs, and the fact that all data is sent in one message right ? The
fact that data wouldn't be validated is interesting though, since
in my tests utf8 validation was really taking a lot of time with
DBus. I'll have a look at p2p DBus and see if it's good for us.


Yeah, you still definitely loose the streaming benefits (unless you
talk directly to the DBus socket yourself). The marshalling cost
depends on the lib you use I think. Using GVariants and GDBus should
incur very low overhead (because GVariant basically *is* the DBus wire
format) - if I understand correctly - which I very well may not :-)


Definitely worth investigating... I never experimented with
GDBus/GVariant. Actually, the streaming benefit in the DBus+FD passing
case is quite low since we have to cache everything on client side
anyway. But yeah, at least we avoid a big alloc on server side...
I won't have time this week for that, but we I can I'll try to do some
benchs to see how that solution (P2P DBus + GDBus) performs.

References:
- [Tracker] New branch: dbus-fd-experiment
  - From: Adrien Bustany
- Re: [Tracker] New branch: dbus-fd-experiment
  - From: Mikkel Kamstrup Erlandsen
- Re: [Tracker] New branch: dbus-fd-experiment
  - From: Adrien Bustany
- Re: [Tracker] New branch: dbus-fd-experiment
  - From: Mikkel Kamstrup Erlandsen

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]