Re: Dup detection



On Sat, Feb 28, 2009 at 07:39:05AM -0800, Bill Moseley wrote:
> Ubuntu 8.10 / f-spot 0.5.0.3
> 
> 
> I'm setting up f-spot for someone and imported their entire "My
> Documents" folder from their old drive.
> 
> The "Detect duplicates" item was checked on the import dialog, yet I
> ended up with many duplicates.
> 
> Here's a few examples in the f-spot database:
> 
> sqlite> select uri,md5_sum from photos where uri like '%visit17%';
> file:///home/dawson/Photos/2002/07/27/visit17.jpg|NxQax6OOx2UrTYXuegNDjA==
> file:///home/dawson/Photos/2002/07/27/visit17-1.jpg|Dpa4apwy/Wguf2VD+UTXog==
> file:///home/dawson/Photos/2002/07/27/visit17c.jpg|KRaGglvxVYNTbsj6IhkEFA==
> file:///home/dawson/Photos/2002/07/27/visit17-2.jpg|Bb9Lspcs+WWOt2HiQKf0Xw==
> 
> Yet the md5's of the photos match:
> 
> $ md5sum Photos/2002/07/27/visit17*.jpg
> 4b09e7c7cf223687a9d2727230c2c5a4  Photos/2002/07/27/visit17-1.jpg
> 4b09e7c7cf223687a9d2727230c2c5a4  Photos/2002/07/27/visit17-2.jpg
> 4b09e7c7cf223687a9d2727230c2c5a4  Photos/2002/07/27/visit17c.jpg
> 4b09e7c7cf223687a9d2727230c2c5a4  Photos/2002/07/27/visit17.jpg
> 
> Clearly, the md5's in the database are not just the file contents.

Hum, is this code below not the code that generates the md5 stored in
the photos table?

I'm not sure I understand the code, but is it first creating
a thumbnail and then calculating the md5?  Is the point of generating
the thumbnail first to strip any image meta data that might be
different?

Is there a compelling reason to not simply do the md5 of the image
itself?

Still, above the photos table has different MD5s for the same image,
so if this is really the method used then it would seem that the
thumbnail generation process doesn't generate the same results for
the same image.


    public static string GenerateMD5 (System.Uri uri)
    {
        try {
            if (md5_cache.ContainsKey (uri))
                return md5_cache [uri];

            using (Gdk.Pixbuf pixbuf = ThumbnailGenerator.Create (uri))
            {
                byte[] serialized = GdkUtils.Serialize (pixbuf);
                byte[] md5 = MD5Generator.ComputeHash (serialized);
                string md5_string = Convert.ToBase64String (md5);

                md5_cache.Add (uri, md5_string);
                return md5_string;
            }
        } catch (Exception e) {
            Log.DebugException (String.Format ("Failed to create MD5Sum for Uri: {0}\n", uri), e);
        }

        return string.Empty; 
    }

-- 
Bill Moseley
moseley hank org
Sent from my iMutt



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]