[Tracker] Unix socket based IPC, experiment for inserting larger amounts of metadata



Hi guys,

Under the radar and as a in-my-free-time experiment I have made a very
simple IPC for the tracker-store branch that uses a raw Unix Socket
instead of D-Bus for the BatchSparqlUpdate and the BatchCommit calls[0].

In tracker-store you use BatchSparqlUpdate and BatchCommit for
requesting storage of larger amounts of your metadata data to Tracker.

With the Unix-Socket based IPC I made you instead have a library and two
async functions:


/* The define is just to help a bit with the layout of this E-mail */
#define X TrackerSocketIpcSparqlUpdateCallback

typedef void (*X)                           (GError  *error,
                                             gpointer user_data);

void tracker_socket_ipc_queue_sparql_update (const gchar   *sparql,
                                             X              callback,
                                             gpointer       user_data,
                                             GDestroyNotify destroy);

void tracker_socket_ipc_queue_commit        (X              callback,
                                             gpointer       user_data,
                                             GDestroyNotify destroy);

Looking at the implementation you'll find GIOChannels at both the client
and the service side. Meaning that this isn't using a thread, but
instead is using the GMainLoop of tracker-store, service side, and your
own application, client side.

The improvised protocol has fixed sized 'commands' which allowed me to
keep it simple and read the data in blocking I/O. Which means that the
recv() call will block until all requested data arrived.

it goes like this:

UPDATE {0000000000} {0000000033}\nINSERT { <test0> a nfo:Document }
UPDATE {0000000001} {0000000033}\nINSERT { <test1> a nfo:Document }
< OK:0000000000:{0000000004}:none
UPDATE {0000000002} {0000000033}\nINSERT { <test2> a nfo:Document }
< OK:0000000001:{0000000004}:none
< OK:0000000002:{0000000004}:none
COMMIT {0000000003} {0000000006}\nCOMMIT
< OK:0000000003:{0000000004}:none

It might be slightly better for scheduling of the two processes involved
if this wasn't done blocking. Already is the difference in throughput
impressive. I'm already pipelining so the tracker-store process wont
require you to wait for the OK or ER reply, you can simply continue
pumping UPDATEs and COMMITs while receiving your OKs and ERs.

Internally it uses a queue that runs on a lower priority on the
GMainLoop of tracker-store, than the GIOChannel (which of course also
uses a GSource, just like the queue does. But at a higher priority).


The numbers are kinda meaningless for comparison with our current D-Bus
communication, because we have not yet tried converting the format of
the DBusMessage that we send to an array of strings (which would
marshall and demarshall a lot faster than a dict, which is what we
currently use). Fair is fair, but I want to stress this as a warning.

  The Unix-Socket IPC experiment doesn't need any marshalling nor
  demarshalling other than the client having to make the SPARQL Update
  sentence. JÃrg has made an API called TrackerSparqlQueryBuilder
  that'll help a developer making such queries (it's available in
  tracker-store branch) [1].

To nonetheless give you a figure (because the difference is impressive):

A test that JÃrg did with the current DBusMessage format, we achieved
transferring 12,000 statements in about 1.3 seconds. We indeed already
grouped the statements so that the DBusMessages would be about 4k in
size each. Don't worry, we too know about this in D-Bus world.

With my test I achieved 10,000 statements in about 0.135 seconds and
100,000 statements in 1.6 seconds. Without grouping of statements. A
Unix Socket is also page-based, so grouping would make this even faster.

We have not yet decided that we'll support this experimental metadata
import mode officially. At this moment is this Unix-Socket IPC stuff a
nice experiment which will help us identify other bottlenecks that are
more urgent to solve.

Note that for 'Query' I would first have to make a serialization format
to serialize the resulting statements of a query to send() them to the
application's process. This being merely an experiment I have not yet
done this. Serializing a bunch of strings wouldn't be very hard.

I personally think that such a 'Query' will or would probably be faster
than letting app processes directly access the sqlite3 db. Because each
connection requires a new page-cache, too.

A Unix-Socket means a send, a recv, a memcpy to skb and a memcpy from
kernel space to user-space of the receiving process. It's kinda hard to
beat that in raw performance and it's unlikely that processes will ever
require more throughput of data, or that due to other bottlenecks we
could ever deliver more throughput.

I made a little test app which you can find at [2]. It illustrates how a
app developer would use the API.

Unlike with D-Bus there wouldn't be activation of tracker-store, though.
Instead your callback would be called with its GError set, indicating
that the service was not ready for your request.


-- 
[0] http://git.gnome.org/cgit/tracker/log/?h=tracker-store-ipc
[1] http://git.gnome.org/cgit/tracker/tree/src/libtracker-common/tracker-sparql-builder.vala?h=tracker-store
[2] 
http://git.gnome.org/cgit/tracker/tree/src/libtracker-socket-ipc/tracker-socket-ipc-test.c?h=tracker-store-ipc

-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]