Re: [Rhythmbox-devel] Musicbrainz and RB

On Sunday, August 10, 2003, at 12:04  am, Colin Walters wrote:
>> has data on how often collisions
>> and things happen.
> Ouch, it looks like TRMs which map to multiple tracks are over 10%,
> that's definitely significant.

To be fair it should be noted that many of these 'collisions' are  
actually the same track on different albums e.g. the single, the album,  
the remastered release with bonus tracks and the greatest hits. This is  
an unavoidable problem when identifying songs by how they sound and why  
id3 tags and filenames are used as extra hints.

Other cases like albums that have a hidden track after 90 tracks  
entirely consisting of 10 seconds silence, audiobooks that begin each  
chapter with the same jingle and other oddities also play a part.

On the other hand this is a genuine problem to a certain degree but  
there is plenty of scope for improving heuristics to allow the computer  
to work automatically with reasonable confidence by considering your  
collection as a whole rather than track by track. For example if you  
can positively id 9 out of 10 songs from a Nine Inch Nails album and  
then a track trm matches the 10th track as well as a Dolly Parton song  
(and you have no other songs by her) then you can give more weight to  
that option.

The following report shows the worst trms for collisions but note that  
of the 10% that collide, 8% only match two songs (which may actually be  
the same song as I noted above):

>> So, in the windows tagger for musicbrainz, after the TRM is generated,
>> there is some comparison done with song length and then meta data to  
>> try
>> and determine the correct song. A confidence rating is determined in  
>> the
>> guess and reported to the user. The user then has the final decision  
>> on
>> the correctness of the guess.
> Hm, fun.  We will have to do something like that too I guess.

There is a beta quality cross-platform library available called  
TunePimp that does most of this for you. The only interface currently  
available is a clunky command line proof-of-principle called tp_tagger  
but don't let that put you off as it still gives a good idea of the  
power that a good gui could provide.

The release announcement is here: 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]