Re: [Tracker] Guessing metadata and retrieval from external resources



On 10-10-11 10:44, Ivan Frade wrote:
On Mon, Oct 10, 2011 at 1:33 AM, Age Bosma <agebosma gmail com> wrote:
It is then up to an application to decide what to use. I.e. normal title
present? Use it. No title present but an external title present? Use
that one if you like.

That is nice in theory, but in practice means the application must do
multiple queries just for the title. Also, the application needs to
know how many different sources of information are available.

 I would say that those "scrapping miners" should override the values
of the properties they know. In some cases we could add new properties
to the ontologies and the application could use "tracker:coalesce" in
the query.


First of all, please don't call it a "scraper" ;-) Web scraping should
really be prevented.
Ideally only web services should be used to obtain additional data.
Websites change relatively often, as Daniel O'Connor pointed out in the
blog post he referred to, compared to web services. Having to alter a
website parser for each website change will become an endless task.
Then there's also the rights issue. While it will be hard to detect, big
resource websites do not allow you to do so.
Using web services will prove to be more stable in the end.

I agree that multiple queries for one piece of info should be prevented.
From an application point of view you just want the title, no matter
where it came from.
Yet I do not feel completely comfortable with overwriting a property
without being able to determine its source (from file or elsewhere). The
info from an external source could have been determined wrong and can
imagine an app to want to indicate this to a user somehow.

Having multiple title properties is no option as discussed on IRC. There
is no way for Tracker to automatically fall back on an external property
and you don't want to start using "tracker:coalesce" in an app for each
possible property.
What about setting an attribute for a property? Is that an option?
I.e. just one title property, set with either info from a file or an
external resource, with a way to determine where it came from afterwards.

 Note that probably these new miners would need some UI (to ask the
user what movie from a list is the one in their filesystem). This can
be tricky (no miner has specific UI so far).


I'd rather not go there. It would be very confusing for a user. You
create or put a file somewhere and all of a sudden a UI pops up out of
nowhere asking you what you've been doing. I can see a lot of people
getting paranoid (even more) ;-)
It can also become quite an overkill, especially at initial indexing.
We would be better of just doing all in the background. Accuracy can be
increased by using multiple resources for the same thing.

Yours,

Age (Forage)

Attachment: signature.asc
Description: OpenPGP digital signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]