Re: Updated patch for Detecting Duplicates in F-Spot



On 9/6/05, Alvaro del Castillo <acs openshine com> wrote:
> Hi!
> 
> On mar, 2005-09-06 at 07:54 +0900, Bengt Thuree wrote:
> > On Må, 2005-09-05, 08:07 pm, Alvaro del Castillo skrev:
> > > Hi!
> > >
> > > On lun, 2005-09-05 at 11:02 +0800, Indulis Bernsteins wrote:
> >
> > >> What about also using the EXIF data, as well as the filename, date
> > >> modified, and other "metadata" (implicit or explicit).
> > >>
> > >
> > > Using the EXIF data has the problem that some images could not have EXIF
> > > data for them.
> >
> > If we use a point system, then if the image do not have any MetaTag
> > information at all, then the points associated with this information is 0,
> > and the only option is file data as well as MD5 data.
> >
> 
> Yes.
> 
> > >
> > >> If you have 2 images, you can easily look at the EXIF information
> > >> common between the 2.  If the camera body serial number is the same,
> > >> and the date/time of shooting is the same, and the EXIF stored image
> > >> name is the same, you don't have to check the MD5 sum to know that the
> > >> 2 are at least derived from the same image.  You can then just check
> > >> file size.  If they are the same, then you've got a match.
> > >>
> > >
> > > Currently F-Spot code follows this approach:
> > >
> > > http://cvs.gnome.org/viewcvs/f-spot/src/PhotoStore.cs?rev=1.71
> > >
> > > But I think that the MD5 signature is very useful because you can be
> > > sure the two files are really the same without error.
> >
> > I think that he did not say we should not stop doing the MD5 stuff, but
> > having a point system where MD5 is part of the points. But just looking at
> > the EXIF data is quicker...
> >
> 
> Sure, it is quicker to look to the EXIF data if you don't have invest
> already the time to compute the MD5 of the file. The EXIF data could be
> very useful in other searchings, for example date searching, but for
> finding duplicates, I find MD5 a more strong solution.
> 

One problem is if exif-editing is implemented in f-spot. Take for
example the Edit-Time function. Upon import, the user is given the
opportunity to correct the timestamp of the image. If we decide to
write the new timestamp to the exif-information, then both the
Exif-information and the md5sum of the image is changed before it is
stored in the archive.

There are several possible solutions for this:

1. Don't update the exif-information in the file. Store the updated
info in the database and/or in separate .xmp-files instead.
2. Calculate the md5sum of the image before any changes are made and
store that in the database.
3. Calculate the md5sum of the image-data only and disregard the
exif-information in the calculation.

and I am sure you can think of even more possible solutions. But this
is something we should think about.

/Mattias



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]