Re: [Tracker] Guessing metadata and retrieval from external resources

From: Martyn Russell <martyn lanedo com>
To: Age Bosma <agebosma gmail com>
Cc: tracker-list gnome org
Subject: Re: [Tracker] Guessing metadata and retrieval from external resources
Date: Mon, 10 Oct 2011 11:01:19 +0100

On 09/10/11 23:33, Age Bosma wrote:

Hi,

Hi,

As far as I understand it is that Tracker currently only sticks to
collecting meta-data which can be retrieved from the actual files.

Would it be an idea to extend this concept by supplementing the metadata
which could not be determined from a file with info from external
resources? And/or intelligent guessing for that matter?

E.g. we have a movie with no tags like title, director, etc. We do have
a file name though.
In a lot of cases the movie title can be subtracted from it. This could
be added to the tracker metadata, followed by requesting the director of
a movie from an external resource like IMDB.
The same would go for the language of a file like a movie or subtitle,
where a language code is included in a file name.
A different approach can be taken with music. A audio fingerprint can be
determined, followed by using that to retrieve the additional meta-data
from MusicBrainz.

The external and guessed meta-data should be stored separate from the
normal meta-data stored by Tracker, marked as an external
title/director/... tag.
It is then up to an application to decide what to use. I.e. normal title
present? Use it. No title present but an external title present? Use
that one if you like.

Why would one want this? Often more info than can be extracted from
files is appreciated. It will prevent applications from having to
reinvent the wheel, deviating from Tracker as their meta-data source
because it does not have the information.
E.g. Rygel could start listing movies on a TV with the actual movie
title instead of using file names or list them by director even though
no tags where present. Banshee (if/when they start using Tracker) does
not have to maintain their own MusicBrainz query service because Tracker
already provides the information.


There are a number of issues here. What springs to mind is:

- Do we write back the data to the file itself (I would like to seethat, but support there is limited right now by file type)?

- Guessing metadata based on filename, etc is currently build timeoptional. Part of me wonders if this should be in thetracker-preferences dialog somewhere so users can configure this moredynamically. Part of me thinks it's not useful though. Perhaps a silentconfiguration not in the UI is more preferred.

Does functionality as described above fit within the goals/scope of tracker?


It certainly does.

Would there be any objections again going into this direction?


Not at all.

Does tracker allow extending functionality as described above?

Yes and no. You could write a miner as suggested, but I feel this is notthe right approach. While the name "miner" makes sense, what we're doinghere is more "post-processing" and we've considered having some daemonto go around cleaning up classes and information which can be derivedfrom content inserted by miner-fs or applications. A couple of exampleshere are:

- You insert a contact for an email, you delete the email, the contactthen stay around. Really shouldn't the contact be removed? It doesdepend on who uses it (the graph) but if it is just there for the email,it should be removed ideally. If some gnome-contacts or otherapplication makes use of it using their graph to insert the data, wewouldn't clean it up.

- You want an album's total duration in time inserted and removed whenalbums appear or are deleted. Right now we have no way to do this.

- As you say, cleaning up the titles and other information from anexternal source like IMDB. I would love to see this by-the-way.

--

I guess you could write a miner to do this. It would listen to graphupdate signals to know when to find out about new music/videos andupdate the store.

You could also write this into tracker-extract/libtracker-extract andhave some common functions to get this information. But you will quicklyrun into interesting conditions like: What do we do when you have noInternet connection? I suppose the Flickr and other miners have had todeal with this so we have infrastructure there for that.

Does the current shared-filemetadata-spec provide a way to store
information as external/additional?

No, not AFAICS. Actually the spec looks quite under-defined and illconsidered in places. I don't know how up to date it is or if it's evenfinished. It certainly doesn't mention how storage of said data shouldbe handled AFAICS.


--
Regards,
Martyn

Founder and CEO of Lanedo GmbH.

Follow-Ups:
- Re: [Tracker] Guessing metadata and retrieval from external resources
  - From: Jens Georg
- Re: [Tracker] Guessing metadata and retrieval from external resources
  - From: Age Bosma

References:
- [Tracker] Guessing metadata and retrieval from external resources
  - From: Age Bosma

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]