[Tracker] Proposal for a new signal mechanism



A new class signal for Tracker

Today's situation

Today we have a simple signal system that causes quite a bit of overhead which we over time tried to reduce. The overhead comes from:
  1. Having to store the URIs of the resources involved in a changeset in tracker-store's memory;
  2. Having to store the predicates involved in a changeset in tracker-store's memory (although far less severe than #1);
  3. Having to UTF-8 validate the strings when we emit them over D-Bus (D-Bus does this implicitly);
  4. D-Bus's own copying and handling of string data;
  5. Heavy traffic on D-Bus;
  6. Context switching between tracker-store and dbus-daemon;
  7. We have to wait with turning on the D-Bus objects until after we have the latest ontology. So after journal replay. And we need to reset the situation after a backup restore. Complex!
Besides this overhead there are problems the consumers have too. I'll make a list in the next section.

Problems of today's signal
  1. Aforementioned overhead: consumes a lot of D-Bus traffic. This is caused by sending over URLs for the subjects and the predicates;
  2. Doesn't make it possible, in case of a delete of <a>, to know <b> in <a> nfo:isLogicalPartOf <b>, as <a> is removed at the point of signal emission;
  3. Round trips to know the literals create more D-Bus traffic;
  4. Transactional changes can't be reliably identified with SubjectsAdded, SubjectsChanged and SubjectsRemoved being separate signals;
  5. A lot of D-Bus objects, instead of letting clients use D-Bus's filtering system.

The drive for a solution

Jürg Billeter and me brainstormed a bit about all these problems. Last few months while optimizing tracker-store's INSERT performance and memory utilization, we brainstormed a lot about how we could reduce the overhead. I believe we have a good idea of the current situation, its internal problems and our current solution (hey of course, we implemented it :p).

We also gained know how about most of the problems consumers have from the maintainer of libqttracker, Petteri Iridian Kiiskinen. Thanks Iridian!

Today I believe that we must abandon the old ship, redo the signal system, break the API. Break it all. Get over it, heal our wounds. Even if that means taking the stress away from all sorts of people who've been using the old signal system, offering massages, giving out sauna coupons. You know, the usual stuff that we won't do for real. Although I'm sure that at a next code-camp in Helsinki we'll have a good sauna to burn all our own stress away.

Anyway ... *shrug*

A proposed solution

Part one: Direct access
With direct-access we will reduce the round-trip cost of a query from a consumer who wants a literal object involved in a changeset: it'll be executed directly on meta.db; you wont use libsqlite's API yourself but libtracker-sparql. However, libtracker-sparql is for direct-access a layer on top of aforementioned libsqlite. The so-called "round-trip" won't even involve IPC: by utilizing the TrackerSparqlCursor API, you'll end up doing sqlite3_step() in your own process, directly on meta.db.

For the consumers of the signal, this removes 3.

Part two: Sending IDs
A while ago we introduced the SPARQL function tracker:id(). The tracker:id() function gives you a unique number that Tracker's RDF store internally. It's not RDF, RDF uses subject URL strings. We just convert this internally for performance reasons, and with tracker:id() you can access that.

Each resource, each class and each predicate (latter two are resources like any other) have such an unique internal ID.

Given that Tracker's class signal system isn't RDF anyway, we decided not to give you subject URL strings in it anymore. Instead, we'll give you these integer IDs.

This for us removes A, B, C, D and E. For the consumers of the signal, this removes 1. Whoohoo!

Part three: Combine SubjectsAdded and SubjectsChanged, and put SubjectsRemoved in the same signal
So we give you two arrays: Inserts and Deletes.

For consumers of the signal, this removes 4.

Part five: Add the class name to the signal
This allows you to use a string filter on your signal subscription in D-Bus.

For us this removes G. For consumers of the signal, this removes 5.

Part six: Pass the object-id for resource objects
You'll get a third number in the Inserts and Deletes arrays: object-id. We wont send you object literals, although for integral objects we're still discussing this. But for resource objects we can without much extra cost give you the object-id.

For consumers of the signal, this removes 2. Whoohoo (this was a hard one)!

Part seven: SPARQL IN, tracker:id() and tracker:subject()
We recently added support for SPARQL IN, we already have tracker:id() and we'll implement tracker:subject().

This makes things like this possible:

SELECT ?t { ?r nie:title ?t .
            FILTER (tracker:id(?r) IN (800, 801, 802, 807)) }

Where 800, 801, 802 and 807 will be the IDs that you receive in the class signal.

The tracker:subject() SPARQL function will allow you to make a very fast version of this:

SELECT ?s { ?s a rdfs:Resource .
            FILTER (tracker:id(?s) IN (800)) }

So it would be something like ... (not sure that you can omit { } in SPARQL, though):

SELECT tracker:subject (800)

For consumers this removes most of the burden introduced by IDs. Consumers are also advised to keep a local Map<tracker:id(), subject> to avoid a lot of SPARQL queries. Although with direct-access it might be just fine.

Part eight: What is left?

What is left is context switching between tracker-store and dbus-daemon, F. But that's our problem. We'll reduce them by grouping transactions and signals together. It's mostly a problem on ARM hardware, but yeah that's a major and important target platform for us. We're on it, we will care about this!

Let's take a look!

<node name="/org/freedesktop/Tracker1/Resources">
  <interface name="org.freedesktop.Tracker1.Resources.Class">
    <signal name="class-signal">
      <arg type="s" name="class-name" />
      <arg type="a(iii)" name="inserts" />
      <arg type="a(iii)" name="deletes" />
    </signal>
  </interface>
</node>

Or in short: sa(iii)a(iii). Here's a bit of pseudo code how it'll look clientside:

void m_callback (cursor) {
  while (cursor.next()) {
   // With direct-access are these c.next()s, sqlite_step() calls
    print ("title: %s", cursor.get_string ());
  }
}

void on_signal (class_name, deleted, inserted) {
  string in_qry = "", qry;
  bool first = true;

  foreach (insert in inserted) {
    if (insert.subject_id is_in (my_resources)) {
       if (!first) { in_qry += ", "; }
       in_qry += insert.subject_id
       first = false;
    }
  }

  qry = string.printf ("SELECT ?titles { ?r nie:title ?titles .
                        FILTER (tracker:id(?r) IN (%s)) }", in_qry);

  connection.query_async (qry, m_callback);
}


Cheers! :-)

Philip


--


Philip Van Hoof
philip codeminded be
freelance software developer
Codeminded BVBA - http://codeminded.be


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]