Re: [Tracker] wip/passive-extraction (and API cleanup?)



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Carlos Garnacho schreef op 9/01/2014 13:48:

Hey Carlos,

I've talking about this branch on #tracker, but now that most work
is done there it is worth raising to the ML. In that branch there
are two extra objects in libtracker-miner:

* TrackerDecorator is a TrackerMiner that implements a passive 
indexing pattern, instead of being expected to feed data directly
to tracker-store, it listens for GraphUpdated signals, so when an
item eligible for indexing is added/updated and is still missing a
nie:dataSource specific to the decorator, it is queued for
processing. On startup it also queries for all elements of the
eligible rdf:types that are missing that nie:dataSource, so all
elements are ensured to be indexed. * TrackerDecoratorFS is a
file-specific implementation of that object, which basically adds
volume monitoring, so indexing within just added volumes is resumed
if interrupted previously, or having the elements removed from the
queue if the volume is removed.

Liking these ideas!

In that branch, tracker-extract does use these features, it is
been turned into a full-blown standalone miner using
TrackerDecorator, while miner-fs stopped calling it. On one hand,
this leads to a greatly simplified indexing in tracker-miner-fs, as
the task is a lot less prone to failure now. On the other hand,
this brings in the 2-pass indexing that was being requested,
miner-fs promptly crawls and fetches GFile info, and
tracker-extract goes soon after filling in extra information.


Nice.

Current caveats ===============

It is worth noting though that in the branch not much has been done
yet about handling extraction failures: * extractor modules
blocking or taking too much time * crashes in extractor modules

Possible solutions go through adding cancellability of extract
tasks and/or having all extraction go into a subprocess that we can
watch on, so the dbus service itself doesn't go away and doesn't
need to be restarted. The latter could also help with Phillip's
idea to run extraction in containers. But about these changes...

Ok.

Future plans? =============

I'm very seriously proposing to make libtracker-extract private 
altogether, the usefulness of having 3rd party extractors is
dubious, as neither allowing them to reimplement extraction for a
famous mimetype nor implementing support for a mimetype we don't
know well enough is positive, it potentially affects tracker
stability and user perception, and helps avoid the point that if a
mimetype has enough traction, it should be in the tracker tree. Its
API is also a mishmash of utility functions that have little to do
with the rest of Tracker, and written in not a quite future-safe
way.

*nod*

Moreover, goggling for "tracker_extract_get_metadata" (the function
that modules must implement), I just see 3 pages of references to
Tracker code, backtraces, and logs, very little references to
external extractors. This API is 1/3 of the Tracker public API, yet
it's been mostly unused externally for the 3 years it's been on.

I agree that libtracker-extract in its current shape shouldn't be
public, although it should be available for implementers of
tracker-extract modules (which don't imo need to be possible as
plugins: with the libav support by Jolla we also see that integrators
prefer to develop such support in the Tracker project rather than out
of tree or as a plugin anyway - plus I don't like the idea of
integrators adding code to our processes, given that modern IPC
systems are really good).

I think that 'external' users of metadata extraction should all be
tunneled to use tracker-extract's over IPC. A 'external' use-case that
I have in mind here, is a MTP daemon seeking to collect additional
metadata through extraction of a temporary file, inserting it itself
upfront doing the rename() to the final file (which I think is a valid
use-case to be an external user, unless TrackerDecorator addresses
this use-case too -- implement the MTP daemon as a TrackerDecorator?).

So a public libtracker-extract API for 'external' users would be one
that ends up calling tracker-extract.

Especially if ever we decide to lock tracker-extract up in a container
(for example using systemd-nspawn, for security and other reasons).


So, I think Tracker should offer API to help integrate with
Tracker, as such this API falls over, I propose to keep it in
private land, and encourage the use of TrackerDecorator, which is
also nice in the way that multiple sources add up information,
unlike extract modules which are individually responsible of
filling in every piece of information.

*nod*, this sounds good.

Actually, I'd like to think we can make 1.0 soon

Would this 1.0 include the TrackerDecorator stuff? Because I think any
such redesign needs a testing phase before we call it stable.

(we technically could ASAP, we've remained feature stable for quite
some time now) and make longer stability promises than we do
currently (having every gnome module depending on Tracker bump .pc
file versions every 6 months is a PITA), IMO the main milestone is
getting the API to a point where we can think of forward
compatibility, and doing this would help greatly.

Phew, long email,

Welcome to my world! :-)

Philip

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJSzp7WAAoJEEP2NSGEz4aDPCgH/RRLfuta8tFvNl0E8Opuj/RA
OgwSub/R6q/J8aBWOVRZ/yRSCxxQrCQdslpDd/PnH7AACYJk0kWMDO9wQMIBHl+n
n8VgQzJ2WkT1capDycuQyJXfkIUjBaAYm5Rhd4voSBQm97vrUiZAkl9BC5M5sxtB
WLughxubmZepTf8zxC29Hre2MKqQuldTZ5KESdQ09tdVKfTeQYVPnF4HzHMezzDs
/vazNsA/gW6vbSa2uLKYdbsi4tPaz6uXLBhDmF7v8hxWMu98tJMTMZuWCQ4iIuaA
vDqsONO9KCSLFInVdbf1OPdrt5cgeDoyxz/QmPOC9c/uAZO8WsbvKWPwsBWbhLk=
=XmFN
-----END PGP SIGNATURE-----


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]