Re: [Tracker] Discarding broken metadata from miners



Hey Carlos,

Thanks for your reply!

On Fri, Feb 24, 2017 at 01:36:32PM +0100, Carlos Garnacho wrote:
Do the Tracker miners version the metadata that they insert into the
database? Or, is it possible to programmatically discard metadata
coming from a certain miner and force a reindex?

There's no versioning... For dropping full miner data, I'd wish we
supported the DROP GRAPH syntax, all filesystem miners in tracker
share the same TRACKER_OWN_GRAPH_URN define.

This however could be open coded as:
"DELETE WHERE { GRAPH <" TRACKER_OWN_GRAPH "> { ?u a rdfs:Resource }}"

That should leave a clean slate for miners, still maybe a bit too clean :).

One related issue is 'user-generated metadata'. For example, when a
user creates an album of photos or a collection of documents, it is
stored as a nfo:DataContainer. When purging broken metadata, we need
to somehow handle this.

This is easy for the online miners because all the online albums and
collections are backed up by the service provider. The Tracker
metadata is just a cache. Even when we blow away the entire metadata
for an account, we are not permanently deleting the collections.

But it is a problem for locally created albums. Do we need a separate
backing store for such things?

Somewhat related to this ...

When an image is shared from gnome-photos (say, to Google),
nie:relatedTo and nie:links are used to connect the local and online
copies. We also embed the provider type (in this case 'google'), the
account name (say 'debarshi ray gmail com) and the ID of the online
copy inside the XMP metadata of the source.

The idea is:

(a) If we lose the Tracker DB, we can restore the nie:relatedTo and
nie:links from the XMP.

(b) If the local copy is copied to another computer, we can still
connect it to its online copy.

However, there is no code at the moment that actually re-establishes
this connection based on the XMP. Only the XMP gets embedded today.
I have no idea what should be the right way to deal with this.

Any advice?

In gnome-online-miners (those are the out-of-tree miners used by
gnome-documents/photos to index online accounts advertised by
gnome-online-accounts), we handle this by having each miner tag their
insertions with nie:version (grep for 'version' in
src/gom-miner.c). Whenever a bug that could have inserted broken
metadata is fixed, we bump the miner version. When the user installs
the updated miner, it will automatically purge the old metadata and
re-index.

So, any suggestions? Thoughts?

For data maintenance, I suggest you look into inserting g-o-m data
into its own graph, version management is more open to discussion. The
approach you picked seems indeed the nepomuk-y way, although I'm not
sure how much of a great argument that is nowadays :).

Yes, the online miners already use separate graphs. There is one graph
per online account. However, the versioning is not part of the graph's
name or so.  They are named as 'gd:goa-account:account_1487767423_9',
where 'gd' is just a namespace prefix that originated from
gnome-documents, and 'account_1487767423_9' is the identifier from
~/.config/goa-1.0/accounts.conf.

Only the versioning is done via nie:version.

Maybe we should also encode the version in the graph's name?

Cheers,
Rishi


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]