Re: Updated patch for Detecting Duplicates in F-Spot
- From: Larry Ewing <lewing novell com>
- To: mattias holmlund gmail com
- Cc: f-spot-list gnome org
- Subject: Re: Updated patch for Detecting Duplicates in F-Spot
- Date: Fri, 16 Sep 2005 21:09:40 -0500
On Fri, 2005-09-09 at 06:09 +0200, Mattias Holmlund wrote:
> On 9/9/05, Gabriel Burt <gabriel burt gmail com> wrote:
> > On Wed, 2005-09-07 at 20:56 +0200, Mattias Holmlund wrote:
> > > One problem is if exif-editing is implemented in f-spot. Take for
> > > example the Edit-Time function. Upon import, the user is given the
> > > opportunity to correct the timestamp of the image. If we decide to
> > > write the new timestamp to the exif-information, then both the
> > > Exif-information and the md5sum of the image is changed before it is
> > > stored in the archive.
> >
> > I don't see that as being a problem - when the file changes (due to EXIF
> > changes, image rotation, etc) then it should just trigger a recompute of
> > the MD5. Am I missing something?
> >
> > If you're talking about catching duplicates when importing, then do the
> > duplicate-detection with the MD5 taken from before any import-related
> > changes take place. After you've seen it's not a dupe, you can change it
> > as needed (eg update the date/time or something) and recompute the MD5,
> > storing that one. That does mean two MD5 computations for each image,
> > but we're talking about rolling two operations (import and batch modify)
> > into one, so I think it's fair, and sane.
>
> I think it is perfectly ok to store two MD5 sums for each image, one
> computed when the image was first imported (original) and one computed
> on the current contents of the image. What I was pointing out was
> simply that storing a single MD5 sum calculated on the current
> imagefile is not enough if you want to take exif-editing into account
> and still detect duplicates. Storing the both the original MD5 sum and
> the current and comparing against both of them upon import probably
> solves the relevant part of the problem. This solution does not detect
> duplicates if the user tries to import two image files that were
> originally from the same picture, but he has changed the exif
> information for one of them in another application. But that case is
> probably not worth taking into account.
>
It would be perfectly possible to compute the hash on just the image
data (and not the metadata) for most of the formats f-spot supports, and
do further analysis once a duplicate is detected This doesn't really
change the utility of storing more than one hash but it would also make
the hash less prone to failing on metadata changes.
--Larry
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]