Re: Adding "Find Duplicates" feature to F-Spot



Hi Steve!

El lun, 20-06-2005 a las 15:54 -0400, Steve Rosen escribió:
> One possible suggestion: Create MD5 values in a separate thread at low
> priority during normal use of F-Spot.  MD5s would then be created in
> the background without interrupting the user's work.  It wouldn't slow
> down import at all.  It would only slow finding duplicates if MD5s had
> not been created for all the photos selected for duplicate scanning.
> 

When the thread should be started?

> This feature could be optional, turned off by default, but turned on
> subsequently if the user selects the Find Duplicates menu item or
> selects to find duplicates during photo import.
> 

I suppose you are talking about a internal feature not visible to the
end user. I find it a bit complex and could be difficult to mantain in
the future to add this kind of things: threads in the backgrounds, that
could be started or not depending in options.

I think I am going to implement the duplicate feature as I said in

http://mail.gnome.org/archives/f-spot-list/2005-June/msg00031.html

and we can play later with other options.

Thanks Steve.

Cheers

-- Alvaro

> Steve
> 
> 
> Alvaro del Castillo wrote: 
> > Hi!
> > ...
> > 
> >   
> > > > Yes, this could be a good way but all the users will suffer the md5
> > > > generation for all the photos. I think that only users that use the
> > > > Duplicate feature should spend time with the MD5 generation for the
> > > > photos if we don't find any uses for the MD5 that could justify that all
> > > > users suffer this loading time.
> > > >       
> > > I have to agree with this opinion. This bothered me too. But I think
> > > it would be a good idea to store the created md5 hashes. I saw this
> > > feature in gthumb where it was a bit slow. In this situation we have
> > > the opportunity to store the created hashes in sql for further use. So
> > > perhaps when you run a duplicate searching it would be a good idea to
> > > store the hashes as a side affect. The next search would be a fast
> > > generation for just the new images and an sql query.
> > >     
> > 
> > Yes, I think this is the best idea. To create MD5 when using the
> > duplicate feature. And to store the MD5 in the database could be also a
> > good idea, yes! When you load the photo data from the database, the MD5
> > could be loaded also if it exists and later, you don't need to recreate
> > it.
> > 
> >   
> > > And at last one more thing. With your original idea you can't alert
> > > the user not to import the same image twice.
> > > 
> > >     
> > 
> > No, if you loose the MD5 you can't. So you are thinking about showing
> > the user a dialog when she tries to import photos that are already in
> > the albums, no? The user then can say "Don't import any repeated photo
> > or import all the repeated photos". This could be a nice feature. We
> > annoy the user with a question but I think she will like to be informed
> > about it :)
> > 
> > 
> > So to fix some points:
> > 
> > 
> > 1. If the user doesn't use the Duplicate feature, she won't suffer any
> > time spend creating md5. The only extra time will be loading the data
> > from the MD5 database field. I think this time should be minimal because
> > you will load the MD5 data field with lots of other fields.
> > 
> > 
> > 2. If the user select the Duplicate feature then:
> > 
> > 2.1 If she has selected a group of photos, the duplicate code will work
> > in this selection. A "Duplicate" tag will be created if it doesn't exist
> > and all the duplicates photos will be marked with this Duplicate tag.
> > The Duplicate tag checkbox will be selected show in the main window will
> > only appear the duplicates photos so the user can work with them.
> > Probably she will delete one of the copies or more if they exists. Maybe
> > we can preselect for the user all the photos except of original per
> > duplicate group.
> > 
> > 2.2 If she doesn't select any photos, we will work with all the photos.
> > 
> > In 2.1 and 2.2 we could need to show a progress dialog.
> > 
> > 
> > How does it sounds?
> > 
> > Cheers
> > 
> > 
> >   
> > > Hubidubi
> > > 
> > >     
> > > > I think the MD5 for photos could be cached in a hash table. This is what
> > > > I do in the current implementation.
> > > > 
> > > > Some numbers: computing the MD5 files for the photos
> > > > 
> > > > acs amigo:~/fotos/airport extreme$ ls -l
> > > > total 1360
> > > > -rwxr--r--  1 acs root 364713 2005-01-05 14:26 dsc00045.jpg
> > > > -rwxr--r--  1 acs root 330323 2005-01-05 14:26 dsc00046.jpg
> > > > -rwxr--r--  1 acs root 324022 2005-01-05 14:26 dsc00047.jpg
> > > > -rwxr--r--  1 acs root 344558 2005-01-05 14:27 dsc00048.jpg
> > > > 
> > > > and measuring the MD5 computing with DateTime.Now.Ticks (I am sure it
> > > > isn't the most accurate way to do it) in my computer (Dell X300 witn 256
> > > > MB RAM and Pentium(R) M processor 1200MHz):
> > > > 
> > > > First time:
> > > > MD5 compute: 00:00:00.0769270
> > > > MD5 compute: 00:00:00.0290020
> > > > MD5 compute: 00:00:00.0200700
> > > > MD5 compute: 00:00:00.0204300
> > > > 
> > > > Second time:
> > > > MD5 compute: 00:00:00.0199370
> > > > MD5 compute: 00:00:00.0174230
> > > > MD5 compute: 00:00:00.0176300
> > > > MD5 compute: 00:00:00.0184470
> > > > 
> > > > Third time:
> > > > MD5 compute: 00:00:00.0219800
> > > > MD5 compute: 00:00:00.0203260
> > > > MD5 compute: 00:00:00.0194000
> > > > MD5 compute: 00:00:00.0199240
> > > > 
> > > > Fourth time:
> > > > MD5 compute: 00:00:00.0284410
> > > > MD5 compute: 00:00:00.0254680
> > > > MD5 compute: 00:00:00.0252140
> > > > MD5 compute: 00:00:00.0277300
> > > > 
> > > > So with not very big photos (1024x768) we can find around 30ms per
> > > > photo. If you have for example 6000 photos you spend 180 seconds (3
> > > > minutes). A really bad first experience for the user. Currently, this 3
> > > > minutes are spread in the minutes you spend in the importing process
> > > > that is a bit slow actually.
> > > > 
> > > > Cheers
> > > > 
> > > > -- Alvaro
> > > > 
> > > > P.S: To compute the MD5 I use the code
> > > > 
> > > > FileStream fs = new FileStream(photo.Path, FileMode.Open,
> > > > FileAccess.Read);
> > > > MD5 md5ServiceProvider = new MD5CryptoServiceProvider();
> > > > byte[] md5 = md5ServiceProvider.ComputeHash(fs);
> > > > 
> > > > StringBuilder hash = new StringBuilder();
> > > > for (int pos = 0; pos < md5.Length; pos++) {
> > > >         hash.Append(md5[pos].ToString("X2").ToLower());
> > > > }
> > > > 
> > > > taken from Mono bugzilla.
> > > > 
> > > > 
> > > >       
> > >     
> > 
> > _______________________________________________
> > F-spot-list mailing list
> > F-spot-list gnome org
> > http://mail.gnome.org/mailman/listinfo/f-spot-list
> > 
> >   
> 
> -- 
> Steve
> _______________________________________________
> F-spot-list mailing list
> F-spot-list gnome org
> http://mail.gnome.org/mailman/listinfo/f-spot-list




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]