Re: [Tracker] Mining GVfs metadata

From: Ivan Frade <ivan frade gmail com>
To: tbzatek redhat com
Cc: tracker-list gnome org
Subject: Re: [Tracker] Mining GVfs metadata
Date: Mon, 22 Nov 2010 20:25:47 +0200

Hi,

On Mon, Nov 22, 2010 at 5:24 PM, Tomas Bzatek <tbzatek redhat com> wrote:

On Fri, 2010-11-19 at 19:05 +0200, Ivan Frade wrote:
> Is there a predefined list of "metadata::" keys? or is completely free
> text? The first case would make our life much easier.

Unfortunately not, applications are free to set whatever metadata they
want. By GIO nature, values can be of type string, number or pointer
(not really useful).

That reminds me I'm not sure which ontology should metadata be mapped
at. Moreover, do we want to store all metadata (filtered from UI stuff)
or rather try to map to existing ontologies? I can imagine applications
using Tracker just to quickly find files marked with metadata of an
app's internal format.

Tracker performs well with "defined" keys. It is possible to store also arbitrary pairs of key=value, but it has a performance cost.

I guess we will find:
1. Metadata that maps to the properties already in the ontology -> we should those properties
2. Metadata with no correspondence in the ontology BUT that makes sense there -> we update the ontology
3. Metadata with no correspondence in the ontology AND doesn't make sense there:
3.1 Because is UI information -> not stored in tracker at all
3.2 Because is app-specific information -> lets study the case (arbitrary key=value or no tracker at all)

> gvfsd-metadata notifying tracker of changes is definitely the way to
> go. Could that daemon provide directly the data? it can save tracker
> the work of reread the attributes and calculate the difference.

It could, but imagine that something goes wrong on the other side or
Tracker is not running at all. Then we would have to track already sent
changes and resend those missing ones. I find it more safe to extract
data on demand, initiated from Tracker side.

Notifying directly changes to tracker helps to save a round trip that can take few miliseconds and uses IO (re-reading the file from disk and processing the metadata). If GVFS knows already the changes, no need to encode them in a file, read and decode to put them into tracker.

I prefer to check carefully those "failure cases" and how can be handled assuming a direct communication. Of course the dependency on tracker can be optional with some compilation flags.

I guess miners are separate processes, does Tracker execute them as
necessary or are they supposed to be running all the time? For our case,
i can imagine a miner that does his job and exits.

Tracker (store) doesn't start/pause/stop any miner by itself. The miners are started in the session login scripts and they activate (via DBus) the store. So far our miners are alive all the time, but nothing prevents them to appear/disappear when need. They can be activated by cron or dbus, the store doesn't care.

In the gvfs case, that the miner will be used very frequently, i wonder what is more efficient: to keep it alive or start/stop under demand. At least in maemo/meego starting a process is a expensive operation.

> In an ideal case, gvfsd-metadata itself would be a "miner",
> translating metadata writing events into tracker metadata. If this is
> not possible, then the miner-fs must read the extended attributes
> (shouldn't have a big impact in the code). I wonder how this is
> combined with inotify.

I'm now looking at available miners which could serve as examples and
point to start from.

Take a look to libtracker-miner:
http://library.gnome.org/devel/libtracker-miner/unstable/

It has a common superclass to all miners offering the control methods in DBus, so an applet/application can start/stop/monitor them (not sure this is relevant for a low level miner like GVFS). There is also the crawling code and few useful things to write a miner.

If we theoretically make gvfsd-metadata submitting changes to tracker
directly, how do we cope with the initial phase? I mean going through
existing metadata, filling tracker database for the first time. What is
the preferred way?

Tracker does the initial crawling as it is doing now. No problem there. I still wonder what happens afterwards: do we need to monitor the filesystem via inotify as we do now? or we can kill our filesystem miner because gvfs give us all relevant changes? Or do we need to receive data from both and handle the duplicated information?

Regards,

Ivan

Follow-Ups:
- Re: [Tracker] Mining GVfs metadata
  - From: Adrien Bustany
- Re: [Tracker] Mining GVfs metadata
  - From: Tomas Bzatek

References:
- [Tracker] Mining GVfs metadata
  - From: Tomas Bzatek
- Re: [Tracker] Mining GVfs metadata
  - From: Ivan Frade
- Re: [Tracker] Mining GVfs metadata
  - From: Tomas Bzatek

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]