Re: Backend design advice



First, thanks for the advice!
Now let's see...

On ה', 2014-01-02 at 13:33 +0000, Sam Thursfield wrote:
Hi

On Mon, Dec 23, 2013 at 12:05 PM, fr33domlover <fr33domlover mailoo org> wrote:
Hello,

This question is quite general but I'd like to know how things in GNOME were
designed, in addition to any general advice you have.

Assume a GUI application uses a central data backend, e.g. Tracker.
Currently Tracker is a single central storage service, i.e. one daemon
accepting connection and queries from apps.

Now assume I want to have more than one database: for example a separate
database for some project I work on, a separate database for documenting
file system hierarchy, separate database for desktop items, etc. The common
approach, at least in SQL, is to have a single entry point, an SQL server,
which handles all the databases stored on the computer. All clients connect
to the same server.

Tracker stores all data in a single per-user database because there is
no simple way to aggregate queries across multiple databases. The goal
of Tracker is to provide a single query interface over all of that
user's data, so this is unlikely to change overnight.

There are "federated queries", but Tracker is somewhat limited because
it is implemented using an SQL database. That's okay. The reason
seperate databases are important is:

1. It allows you to keep them as separate files and move them to
anywhere you like. It is somewhat similar to how running serves using
virtual machines makes it easy to separate, clone and backup them.
Tracker is made just for desktop data, so in this specific case it's not
a big deal I guess.

2. Some uses require special software. Again, not something Tracker does
at the moment, but in general here's an example: Assume I want to create
an RDF query interface wrapping the file system, i.e. folder hierarchy
can be queried using sophisticated SPARQL queries. Running an SQL
database for the file system would probably make it very slow, so an
efficient backend is needed, a thin wrapper above the existing file
system APIs.

So the question is whether a single server handles separate databases.
Tracker is definitely an important backend I'm interested in.


If you want to use Tracker for stuff other than storing per-user
desktop metadata, it's not impossible to get it to write to a
different file. The 'tracker-sandbox' program inside 'utils/' in the
Tracker source tree shows how to do this -- basically you need to
start a separate D-Bus session with XDG_CACHE_HOME and XDG_DATA_HOME
pointing to the alternate location.  That D-Bus session will have its
own tracker-store process running which will read and write to the
alternative location.

That's great for trying things in a sandbox!


Here's another possible approach: If the number of databases is small, it
may be reasonable to launch a separate server for each database, and have
them communicate with each other through IPC. There would probablly need to
be another service to route network traffic to the right server, but aside
from that, each database would have its own server (daemon) handling access
to it.

Would the second approach be better at anything? Is it something people do,
something reasonable? Or the approach of a single server per computer is
clearly better?

(Would the second approach be more safe or scalable or resilient etc.)

Storage and processing of complex non-desktop stuff is outside the
normal use-case of Tracker and GNOME, so there's not really any point
prescribing one approach over the other without knowing the specific
requirements. We're happy to advise on customising Tracker for
different use cases though if it seems the appropriate solution.

I think Tracker can't solve all my problems because it's based on SQL
and uses specific ontologies, which makes it unsuitable to function as a
general-purpose semantic datastore, but it definitely can serve as a
backend for the desktop data it is designed to store.

I'm working on a semantic desktop project with some similarity to the
ideas behind Haystack Semantic Desktop, a project developed in MIT. The
idea is to have all the information inside semantic databases which
function as nodes and communicate in a peer-to-peer manner. Inside the
local machine, the requirement is fast local queries and managing
several databases with federated queries, e.g. imagine an SQL server do
a query involving several databases it manages.


The second approach you describe (one server per database) is much
simpler to actually implement using Tracker. Work on making it easier
to run concurrent Tracker sessions would be welcome because we already
do this in the 'functional-tests' test suite, but the code there is
quite old and fragile.

Yes, it's probably easier to just run several independent processes. But
if they need to communicate on federated queries, the overhead of IPC
may become significant, and RAM usage in general rises as the number of
databases rises. A single server can run federated queries faster and
manage different databases using separate threads or use a single
message queue (reactor pattern), which can make the whole system much
more scalable.

Tracker is not made for these use cases, but to be honest I needed some
general advice about databases, and I knew you guys have experience with
them.

Haven't had time yet to create a distributed libre replacement for Stack
Overflow, you see... anyway Tracker is my first-priotiry backend.


Sam

fr33domlover




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]