Re: Adding "Find Duplicates" feature to F-Spot
- From: Alvaro del Castillo <acs openshine com>
- To: Steve Rosen <steve sjrosen mailshell com>
- Cc: F-Spot <f-spot-list gnome org>
- Subject: Re: Adding "Find Duplicates" feature to F-Spot
- Date: Wed, 22 Jun 2005 05:10:37 +0200
Hi Steve!
El lun, 20-06-2005 a las 15:54 -0400, Steve Rosen escribió:
> One possible suggestion: Create MD5 values in a separate thread at low
> priority during normal use of F-Spot. MD5s would then be created in
> the background without interrupting the user's work. It wouldn't slow
> down import at all. It would only slow finding duplicates if MD5s had
> not been created for all the photos selected for duplicate scanning.
>
When the thread should be started?
> This feature could be optional, turned off by default, but turned on
> subsequently if the user selects the Find Duplicates menu item or
> selects to find duplicates during photo import.
>
I suppose you are talking about a internal feature not visible to the
end user. I find it a bit complex and could be difficult to mantain in
the future to add this kind of things: threads in the backgrounds, that
could be started or not depending in options.
I think I am going to implement the duplicate feature as I said in
http://mail.gnome.org/archives/f-spot-list/2005-June/msg00031.html
and we can play later with other options.
Thanks Steve.
Cheers
-- Alvaro
> Steve
>
>
> Alvaro del Castillo wrote:
> > Hi!
> > ...
> >
> >
> > > > Yes, this could be a good way but all the users will suffer the md5
> > > > generation for all the photos. I think that only users that use the
> > > > Duplicate feature should spend time with the MD5 generation for the
> > > > photos if we don't find any uses for the MD5 that could justify that all
> > > > users suffer this loading time.
> > > >
> > > I have to agree with this opinion. This bothered me too. But I think
> > > it would be a good idea to store the created md5 hashes. I saw this
> > > feature in gthumb where it was a bit slow. In this situation we have
> > > the opportunity to store the created hashes in sql for further use. So
> > > perhaps when you run a duplicate searching it would be a good idea to
> > > store the hashes as a side affect. The next search would be a fast
> > > generation for just the new images and an sql query.
> > >
> >
> > Yes, I think this is the best idea. To create MD5 when using the
> > duplicate feature. And to store the MD5 in the database could be also a
> > good idea, yes! When you load the photo data from the database, the MD5
> > could be loaded also if it exists and later, you don't need to recreate
> > it.
> >
> >
> > > And at last one more thing. With your original idea you can't alert
> > > the user not to import the same image twice.
> > >
> > >
> >
> > No, if you loose the MD5 you can't. So you are thinking about showing
> > the user a dialog when she tries to import photos that are already in
> > the albums, no? The user then can say "Don't import any repeated photo
> > or import all the repeated photos". This could be a nice feature. We
> > annoy the user with a question but I think she will like to be informed
> > about it :)
> >
> >
> > So to fix some points:
> >
> >
> > 1. If the user doesn't use the Duplicate feature, she won't suffer any
> > time spend creating md5. The only extra time will be loading the data
> > from the MD5 database field. I think this time should be minimal because
> > you will load the MD5 data field with lots of other fields.
> >
> >
> > 2. If the user select the Duplicate feature then:
> >
> > 2.1 If she has selected a group of photos, the duplicate code will work
> > in this selection. A "Duplicate" tag will be created if it doesn't exist
> > and all the duplicates photos will be marked with this Duplicate tag.
> > The Duplicate tag checkbox will be selected show in the main window will
> > only appear the duplicates photos so the user can work with them.
> > Probably she will delete one of the copies or more if they exists. Maybe
> > we can preselect for the user all the photos except of original per
> > duplicate group.
> >
> > 2.2 If she doesn't select any photos, we will work with all the photos.
> >
> > In 2.1 and 2.2 we could need to show a progress dialog.
> >
> >
> > How does it sounds?
> >
> > Cheers
> >
> >
> >
> > > Hubidubi
> > >
> > >
> > > > I think the MD5 for photos could be cached in a hash table. This is what
> > > > I do in the current implementation.
> > > >
> > > > Some numbers: computing the MD5 files for the photos
> > > >
> > > > acs amigo:~/fotos/airport extreme$ ls -l
> > > > total 1360
> > > > -rwxr--r-- 1 acs root 364713 2005-01-05 14:26 dsc00045.jpg
> > > > -rwxr--r-- 1 acs root 330323 2005-01-05 14:26 dsc00046.jpg
> > > > -rwxr--r-- 1 acs root 324022 2005-01-05 14:26 dsc00047.jpg
> > > > -rwxr--r-- 1 acs root 344558 2005-01-05 14:27 dsc00048.jpg
> > > >
> > > > and measuring the MD5 computing with DateTime.Now.Ticks (I am sure it
> > > > isn't the most accurate way to do it) in my computer (Dell X300 witn 256
> > > > MB RAM and Pentium(R) M processor 1200MHz):
> > > >
> > > > First time:
> > > > MD5 compute: 00:00:00.0769270
> > > > MD5 compute: 00:00:00.0290020
> > > > MD5 compute: 00:00:00.0200700
> > > > MD5 compute: 00:00:00.0204300
> > > >
> > > > Second time:
> > > > MD5 compute: 00:00:00.0199370
> > > > MD5 compute: 00:00:00.0174230
> > > > MD5 compute: 00:00:00.0176300
> > > > MD5 compute: 00:00:00.0184470
> > > >
> > > > Third time:
> > > > MD5 compute: 00:00:00.0219800
> > > > MD5 compute: 00:00:00.0203260
> > > > MD5 compute: 00:00:00.0194000
> > > > MD5 compute: 00:00:00.0199240
> > > >
> > > > Fourth time:
> > > > MD5 compute: 00:00:00.0284410
> > > > MD5 compute: 00:00:00.0254680
> > > > MD5 compute: 00:00:00.0252140
> > > > MD5 compute: 00:00:00.0277300
> > > >
> > > > So with not very big photos (1024x768) we can find around 30ms per
> > > > photo. If you have for example 6000 photos you spend 180 seconds (3
> > > > minutes). A really bad first experience for the user. Currently, this 3
> > > > minutes are spread in the minutes you spend in the importing process
> > > > that is a bit slow actually.
> > > >
> > > > Cheers
> > > >
> > > > -- Alvaro
> > > >
> > > > P.S: To compute the MD5 I use the code
> > > >
> > > > FileStream fs = new FileStream(photo.Path, FileMode.Open,
> > > > FileAccess.Read);
> > > > MD5 md5ServiceProvider = new MD5CryptoServiceProvider();
> > > > byte[] md5 = md5ServiceProvider.ComputeHash(fs);
> > > >
> > > > StringBuilder hash = new StringBuilder();
> > > > for (int pos = 0; pos < md5.Length; pos++) {
> > > > hash.Append(md5[pos].ToString("X2").ToLower());
> > > > }
> > > >
> > > > taken from Mono bugzilla.
> > > >
> > > >
> > > >
> > >
> >
> > _______________________________________________
> > F-spot-list mailing list
> > F-spot-list gnome org
> > http://mail.gnome.org/mailman/listinfo/f-spot-list
> >
> >
>
> --
> Steve
> _______________________________________________
> F-spot-list mailing list
> F-spot-list gnome org
> http://mail.gnome.org/mailman/listinfo/f-spot-list
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]