[Banshee-List] A Library Clean up Idea

Hi all,

I have been thinking for a while about cleaning up my music library
and its given me an idea. I have tried to use an MD5 tool which hashes
every file in my Music directory then shows any duplicates. The
problem is most of the duplicates have been introduced by merging with
various backups and other sources. What often happens is a Track gets
its meta data updated but the duplicate doesn't when these a written
back to the file it makes the files slightly different so they don't
show up as duplicates.

What I was thinking of doing was writing some code to extract the
compressed audio data from the file and hash that rather than the
whole file including the meta data so that way I can find duplicates.
Since it doesn't need to actually decompress the audio it should be
quite straight forward and fairly quick. It does need to know about
the structure of each music file type however. So it would be limited
to a few specific types.

Since nearly all my music is MP3's then it would need to know about
ID3 tags and all the verities of them. I know for Vorbis you would
just need to skip the stream header packets, which are quite straight
forward. They all start with a set header and there's 3 header packets
so you can hash the data after you find the fourth Vorbis packet. AAC
I am not so sure about. It may be possible to do this via G Streamer,
its something that would be wroth looking into. Other wise the code
would just have to navigate the files directly. I think its fairly
straight forward to write the code to actually do this.

The other option is to use some form of Audio Finger Printing
technology. It would be much slower but would also pick up some
duplicates which are not from the same original and since it uses the
decoded audio it would work with every music format banshee supports.
I did a quick Google and couldn't find an Open Source one that could
be used. The only possibility is Music Brainz but I am not sure if the
right bit of there system is open source or not, its quite

What I am wondering about is if this would be something appropriate to
integrate into banshee. It strikes me it would be a fairly handy
feature to have but I am not sure, particularly in terms of the GUI,
the best way to integrate it.

What does everyone think, would this be useful and is it worth
integrating into banshee?

Charlie M

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]