Re: [Tracker] Mining GVfs metadata

From: "Adrien Bustany" <abustany gnome org>
To: "Ivan Frade" <ivan frade gmail com>
Cc: tracker-list gnome org
Subject: Re: [Tracker] Mining GVfs metadata
Date: Tue, 23 Nov 2010 10:45:09 +0100

Hi,

On Mon, Nov 22, 2010 at 5:24 PM, Tomas Bzatek <tbzatek redhat com> wrote:


On Fri, 2010-11-19 at 19:05 +0200, Ivan Frade wrote:

Is there a predefined list of "metadata::" keys? or is completely free
text? The first case would make our life much easier.


Unfortunately not, applications are free to set whatever metadata they
want. By GIO nature, values can be of type string, number or pointer
(not really useful).

That reminds me I'm not sure which ontology should metadata be mapped
at. Moreover, do we want to store all metadata (filtered from UI stuff)
or rather try to map to existing ontologies? I can imagine applications
using Tracker just to quickly find files marked with metadata of an
app's internal format.


Tracker performs well with "defined" keys. It is possible to store also
arbitrary pairs of key=value, but it has a performance cost.

I guess we will find:
1. Metadata that maps to the properties already in the ontology -> we
should
those properties
2. Metadata with no correspondence in the ontology BUT that makes sense
there -> we update the ontology
3. Metadata with no correspondence in the ontology AND doesn't make sense
there:
  3.1 Because is UI information -> not stored in tracker at all
  3.2 Because is app-specific information -> lets study the case
(arbitrary
key=value or no tracker at all)


For properties without a mapping you can also use a nao:Property
object... Definitely not the best solution though (it works, but defeats
the purpose of having a common ontology).

gvfsd-metadata notifying tracker of changes is definitely the way to
go. Could that daemon provide directly the data? it can save tracker
the work of reread the attributes and calculate the difference.


It could, but imagine that something goes wrong on the other side or
Tracker is not running at all. Then we would have to track already sent
changes and resend those missing ones. I find it more safe to extract
data on demand, initiated from Tracker side.


Notifying directly changes to tracker helps to save a round trip that can
take few miliseconds and uses IO (re-reading the file from disk and
processing the metadata). If GVFS knows already the changes, no need to
encode them in a file, read and decode to put them into tracker.

I prefer to check carefully those "failure cases" and how can be handled
assuming a direct communication. Of course the dependency on tracker can
be
optional with some compilation flags.

I guess miners are separate processes, does Tracker execute them as
necessary or are they supposed to be running all the time? For our case,
i can imagine a miner that does his job and exits.


Tracker (store) doesn't start/pause/stop any miner by itself. The miners
are
started in the session login scripts and they activate (via DBus) the
store.
So far our miners are alive all the time, but nothing prevents them to
appear/disappear when need. They can be activated by cron or dbus, the
store
doesn't care.

In the gvfs case, that the miner will be used very frequently, i wonder
what
is more efficient: to keep it alive or start/stop under demand. At least
in
maemo/meego starting a process is a expensive operation.

In an ideal case, gvfsd-metadata itself would be a "miner",
translating metadata writing events into tracker metadata. If this is
not possible, then the miner-fs must read the extended attributes
(shouldn't have a big impact in the code). I wonder how this is
combined with inotify.


I'm now looking at available miners which could serve as examples and
point to start from.


Take a look to libtracker-miner:
http://library.gnome.org/devel/libtracker-miner/unstable/

It has a common superclass to all miners offering the control methods in
DBus, so an applet/application can start/stop/monitor them (not sure this
is
relevant for a low level miner like GVFS). There is also the crawling code
and few useful things to write a miner.

If we theoretically make gvfsd-metadata submitting changes to tracker

directly, how do we cope with the initial phase? I mean going through
existing metadata, filling tracker database for the first time. What is
the preferred way?


Tracker does the initial crawling as it is doing now. No problem there. I
still wonder what happens afterwards: do we need to monitor the filesystem
via inotify as we do now? or we can kill our filesystem miner because gvfs
give us all relevant changes? Or do we need to receive data from both and
handle the duplicated information?

Regards,

Ivan
_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list

References:
- [Tracker] Mining GVfs metadata
  - From: Tomas Bzatek
- Re: [Tracker] Mining GVfs metadata
  - From: Ivan Frade
- Re: [Tracker] Mining GVfs metadata
  - From: Tomas Bzatek
- Re: [Tracker] Mining GVfs metadata
  - From: Ivan Frade

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]