Fw: Updated patch for Detecting Duplicates in F-Spot




Alvaro,

I think what you're doing is very important and useful- can I suggest an idea I've been thinking about for a while to extend what you've been doing?  I'm not suggesting you code this personally :-)

What about also using the EXIF data, as well as the filename, date modified, and other "metadata" (implicit or explicit).

If you have 2 images, you can easily look at the EXIF information common between the 2.  If the camera body serial number is the same, and the date/time of shooting is the same, and the EXIF stored image name is the same, you don't have to check the MD5 sum to know that the 2 are at least derived from the same image.  You can then just check file size.  If they are the same, then you've got a match.

When you apply for a passport in Australia, each piece of ID you show has "points" associated with it, when you get to the right # of points, you can get a passport.  A similar thing could be done for images and the user can select the metadata to use (inc explicit metadata like info in f-spot's database and in the EXIF, as well as "implicit" metadata like filesize and filename), and the weightings applied to each factor to see that one image "qualifies" to be a copy (or derivative) of another image.

I've also been thinking that it'd be possible to implement an approach where the images are recognised as not just being the same, but also derived from the same image.  This metadata is currently constructed when f-spot creates a new image, so there is an "original" and copies.

The long term aim would be to be able to use the classification of related images to allow the user to set a policy for their images.  For example, "I want the original kept in this directory, and a copy in this directory.  I'd also like a small version (640x480 or as close as you can get to this) in this directory, kept for 12 months only. I'd also like 2 copies on off-line storage (CDs).  Then I'd like a medium size copy uploaded to Flickr."  Then, f-spot can use its knowledge about related images to make sure that all of the images have had a second copy made, and have been shrunk down- if not, this can happen in the background.  And if you want to change your policy, for example, to have 3 copies, or store your photos stored in directories sorted by the predominant colour, f-spot automagically does this for you without you having to manually make the changes.  Duplicate photos would be simply deleted (according to your policy)

I think there is a real advantage to losing the "filesystem" oriented approach to managing images, and moving to a metadata-based one.  f-spot could become not just an image viewer, but also a darn good image manager!

Cheers,

Indulis


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]