Re: Updated patch for Detecting Duplicates in F-Spot



On Fri, 2005-09-09 at 06:09 +0200, Mattias Holmlund wrote:
> On 9/9/05, Gabriel Burt <gabriel burt gmail com> wrote:
> > On Wed, 2005-09-07 at 20:56 +0200, Mattias Holmlund wrote:
> > > One problem is if exif-editing is implemented in f-spot. Take for
> > > example the Edit-Time function. Upon import, the user is given the
> > > opportunity to correct the timestamp of the image. If we decide to
> > > write the new timestamp to the exif-information, then both the
> > > Exif-information and the md5sum of the image is changed before it is
> > > stored in the archive.
> > 
> > I don't see that as being a problem - when the file changes (due to EXIF
> > changes, image rotation, etc) then it should just trigger a recompute of
> > the MD5.  Am I missing something?
> > 
> > If you're talking about catching duplicates when importing, then do the
> > duplicate-detection with the MD5 taken from before any import-related
> > changes take place. After you've seen it's not a dupe, you can change it
> > as needed (eg update the date/time or something) and recompute the MD5,
> > storing that one. That does mean two MD5 computations for each image,
> > but we're talking about rolling two operations (import and batch modify)
> > into one, so I think it's fair, and sane. 
> 
> I think it is perfectly ok to store two MD5 sums for each image, one
> computed when the image was first imported (original) and one computed
> on the current contents of the image. What I was pointing out was
> simply that storing a single MD5 sum calculated on the current
> imagefile is not enough if you want to take exif-editing into account
> and still detect duplicates. Storing the both the original MD5 sum and
> the current and comparing against both of them upon import probably
> solves the relevant part of the problem. This solution does not detect
> duplicates if the user tries to import two image files that were
> originally from the same picture, but he has changed the exif
> information for one of them in another application. But that case is
> probably not worth taking into account.
> 

It would be perfectly possible to compute the hash on just the image
data (and not the metadata) for most of the formats f-spot supports, and
do further analysis once a duplicate is detected  This doesn't really
change the utility of storing more than one hash but it would also make
the hash less prone to failing on metadata changes.

--Larry




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]