Re: [Tracker] Guessing metadata and retrieval from external resources



On 09/10/11 23:33, Age Bosma wrote:
Hi,

Hi,

As far as I understand it is that Tracker currently only sticks to
collecting meta-data which can be retrieved from the actual files.

Would it be an idea to extend this concept by supplementing the metadata
which could not be determined from a file with info from external
resources? And/or intelligent guessing for that matter?

E.g. we have a movie with no tags like title, director, etc. We do have
a file name though.
In a lot of cases the movie title can be subtracted from it. This could
be added to the tracker metadata, followed by requesting the director of
a movie from an external resource like IMDB.
The same would go for the language of a file like a movie or subtitle,
where a language code is included in a file name.
A different approach can be taken with music. A audio fingerprint can be
determined, followed by using that to retrieve the additional meta-data
from MusicBrainz.

The external and guessed meta-data should be stored separate from the
normal meta-data stored by Tracker, marked as an external
title/director/... tag.
It is then up to an application to decide what to use. I.e. normal title
present? Use it. No title present but an external title present? Use
that one if you like.

Why would one want this? Often more info than can be extracted from
files is appreciated. It will prevent applications from having to
reinvent the wheel, deviating from Tracker as their meta-data source
because it does not have the information.
E.g. Rygel could start listing movies on a TV with the actual movie
title instead of using file names or list them by director even though
no tags where present. Banshee (if/when they start using Tracker) does
not have to maintain their own MusicBrainz query service because Tracker
already provides the information.

There are a number of issues here. What springs to mind is:

- Do we write back the data to the file itself (I would like to see that, but support there is limited right now by file type)?

- Guessing metadata based on filename, etc is currently build time optional. Part of me wonders if this should be in the tracker-preferences dialog somewhere so users can configure this more dynamically. Part of me thinks it's not useful though. Perhaps a silent configuration not in the UI is more preferred.

Does functionality as described above fit within the goals/scope of tracker?

It certainly does.

Would there be any objections again going into this direction?

Not at all.

Does tracker allow extending functionality as described above?

Yes and no. You could write a miner as suggested, but I feel this is not the right approach. While the name "miner" makes sense, what we're doing here is more "post-processing" and we've considered having some daemon to go around cleaning up classes and information which can be derived from content inserted by miner-fs or applications. A couple of examples here are:

- You insert a contact for an email, you delete the email, the contact then stay around. Really shouldn't the contact be removed? It does depend on who uses it (the graph) but if it is just there for the email, it should be removed ideally. If some gnome-contacts or other application makes use of it using their graph to insert the data, we wouldn't clean it up.

- You want an album's total duration in time inserted and removed when albums appear or are deleted. Right now we have no way to do this.

- As you say, cleaning up the titles and other information from an external source like IMDB. I would love to see this by-the-way.

--

I guess you could write a miner to do this. It would listen to graph update signals to know when to find out about new music/videos and update the store.

You could also write this into tracker-extract/libtracker-extract and have some common functions to get this information. But you will quickly run into interesting conditions like: What do we do when you have no Internet connection? I suppose the Flickr and other miners have had to deal with this so we have infrastructure there for that.

Does the current shared-filemetadata-spec provide a way to store
information as external/additional?

No, not AFAICS. Actually the spec looks quite under-defined and ill considered in places. I don't know how up to date it is or if it's even finished. It certainly doesn't mention how storage of said data should be handled AFAICS.

--
Regards,
Martyn

Founder and CEO of Lanedo GmbH.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]