Re: Duplicates patch new version - Check box in importing



Hi!

El mié, 19-10-2005 a las 13:32 -0500, Larry Ewing escribió: 
> Alvaro, As promised here is a review of the duplicates patch. It is a
> very useful patch and does a lot of things very well but I have a few
> issues still.
> 

I suppose it! It is the first time we try to integrate code so I can
understand it perfectly.

> I'm going to go into a lot of detail in this review because I haven't
> done enough public review on this list and I want everyone to know what
> what I'd like to see.  In particular I'm going to nitpick on formatting
> a bit, not because you did anything wrong (I hadn't explicitly stated
> any of this before so how could you follow it), but because I want
> everyone to get a feel for what I'd like to see. Please forgive me for
> using you as a public example I'm extremely greatful for the work you've
> done.

Don't worry Larry, this is exactly what I was waiting for to happen in
order we can integrate code upstream. I have tried to follow in someway
the coding style but there isn't a formatting style until now so I
haven't the rules to follow. Now it is time to follow them in order to
have clean code :)

I plan to follow your comments and also check that for the future, we
have a clear path for other developers to follow the style.

I am reading right now the HACKING file in the CVS:

http://cvs.gnome.org/viewcvs/f-spot/HACKING?rev=1.1&view=markup

So the first step is to be sure we have a open issue in bugzilla about
the duplicates feature.

http://bugs.gnome.org/buglist.cgi?product=f-spot&&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED

Reading the bugs I think that this one could be the right bug to include
work and details about the patch:

http://bugs.gnome.org/show_bug.cgi?id=169646

So I will finally attach the patch to this bugzilla entry.

Now about the codying style:

<<Beagle attempts to follow the Mono coding conventions.  The following
description of those conventions was shamelessly stolen from Beagle's
HACKING file.>>

s/Beagle/F-Spot

So I will follow your comments Larry and also check that we have them in
the HACKING Codying Style section.

The first step is to download a clean F-Spot version from CVS and apply
the patch I have sent to the list ...

acs amigo:~/devel/gnome/f-spot$ patch -p0 < /tmp/DuplicatesPatch
patching file src/ImportCommand.cs
Hunk #1 succeeded at 372 (offset 7 lines).
Hunk #2 succeeded at 389 (offset 7 lines).
Hunk #3 succeeded at 406 (offset 7 lines).
Hunk #4 succeeded at 419 (offset 7 lines).
Hunk #5 succeeded at 453 (offset 7 lines).
Hunk #6 succeeded at 466 (offset 7 lines).
Hunk #7 succeeded at 533 (offset 7 lines).
Hunk #8 succeeded at 553 (offset 7 lines).
Hunk #9 succeeded at 690 (offset 7 lines).
Hunk #10 succeeded at 723 (offset 7 lines).
Hunk #11 succeeded at 867 (offset 7 lines).
patching file src/MainWindow.cs
Hunk #3 succeeded at 1690 (offset -1 lines).
patching file src/Makefile.am
patching file src/PhotoStore.cs
patching file src/TagStore.cs
patching file src/f-spot.glade

Great! It works so we can continue using it without modifications and
follow your comments Larry with warranties. 

> 
> On Tue, 2005-10-04 at 06:30 +0200, Alvaro del Castillo wrote:
> > I attach a modified patch with the problem solved. Apply it against the
> > current CVS.
> 
> 
> > Index: src/ImportCommand.cs
> > ===================================================================
> > RCS file: /cvs/gnome/f-spot/src/ImportCommand.cs,v
> > retrieving revision 1.39
> > diff -u -b -B -p -r1.39 ImportCommand.cs
> > --- src/ImportCommand.cs        22 Aug 2005 10:47:34 -0000      1.39
> > +++ src/ImportCommand.cs        4 Oct 2005 04:27:22 -0000
> > @@ -362,6 +365,7 @@ public class ImportCommand : FSpot.Glade
> >         [Glade.Widget] Gtk.ScrolledWindow photo_scrolled;
> >         [Glade.Widget] Gtk.CheckButton attach_check;
> >         [Glade.Widget] Gtk.CheckButton recurse_check;
> > +    [Glade.Widget] Gtk.CheckButton duplicate_check;
> >         [Glade.Widget] Gtk.Button ok_button;
> >         [Glade.Widget] Gtk.Image tag_image;
> >         [Glade.Widget] Gtk.Label tag_label;
> > @@ -378,6 +382,11 @@ public class ImportCommand : FSpot.Glade
> >         bool copy;
> >  
> >         int total;
> > +    
> > +    private int          duplicates;
> > +    private long         total_md5;
> > +    private long         total_import;
> > +
> >         PhotoStore store;
> 
> Here we see a problem with the whitespace formatting.  Use tabs and if
> you must treat tabs as spaces they are 8 spaces wide not four.
> 

I will use 8 space tabs. I am working with Emacs so I will include in my
".emacs" file the configuration proposed in the HACKING file.

Done! Now only 1 white space between the type and the variable name and
all the lines using 8 space tabs.

> >  
> >         FSpot.Delay step;
> > @@ -390,6 +399,8 @@ public class ImportCommand : FSpot.Glade
> >         {
> >                 main_window = mw;
> >                 step = new FSpot.Delay (10, new GLib.IdleHandler (Step));
> > +        total_md5 = 0;
> > +        total_import = 0;
> >         }
> >  
> >         private void HandleDialogResponse (object obj, ResponseArgs args)
> > @@ -401,12 +412,16 @@ public class ImportCommand : FSpot.Glade
> >                 }
> >         }
> >  
> > -       private void UpdateProgressBar (int count, int total)
> > +       private void UpdateProgressBar (int count, int total, int duplicates)
> >         {
> >                 if (progress_bar == null)
> >                         return;
> > -
> > +        if (duplicates > 0) {
> > +            progress_bar.Text = String.Format ("Importing {0} of {1}, {2} duplicate/s", 
> > +                                               count-duplicates, total-duplicates, duplicates);
> > +        } else {
> >                 progress_bar.Text = String.Format ("Importing {0} of {1}", count, total);
> > +        }
> >                 progress_bar.Fraction = (double) count / System.Math.Max (total, 1);
> >         }
> >  
> > @@ -431,6 +446,10 @@ public class ImportCommand : FSpot.Glade
> >                 Pixbuf thumbnail;
> >                 int count;
> >                 bool ongoing = true;
> > +        TimeSpan timeTakenMD5 = new TimeSpan();;
> > +
> > +        //Console.WriteLine ("Start importing the photo ...");
> > +        long startTime = DateTime.Now.Ticks;
> >  
> >                 if (importer == null)
> >                         return false;
> > @@ -440,14 +459,48 @@ public class ImportCommand : FSpot.Glade
> >                 if (thumbnail == null) {
> >                         Console.WriteLine ("Could not import file");
> >                 } else {
> > -                       //icon_scrolled.Visible = true;
> > +            long startTimeMD5 = DateTime.Now.Ticks;
> > +            photo.MD5Sum = FSpot.DuplicatesFinder.PhotoMD5 (photo);            
> > +            store.Commit (photo);
> > +            long endTimeMD5 = DateTime.Now.Ticks;
> > +            total_md5 += endTimeMD5 - startTimeMD5;
> > +            timeTakenMD5 = new TimeSpan (endTimeMD5 - startTimeMD5);
> > +            //Console.WriteLine ("Total time for MD5 {0}", timeTakenMD5.ToString());
> > +
> > +            if (FSpot.DuplicatesFinder.IsDuplicate (photo)) {
> > +                photo.AddTag (MainWindow.Toplevel.Database.Tags.Duplicate);
> > +                store.Commit (photo);
> > +            }
> > +            
> > +            if (FSpot.DuplicatesFinder.IsDuplicate (photo)) {
> > +                //Console.WriteLine ("Photo: {0} duplicated\n", photo.Name);
> > +                duplicates++;
> > +                if (duplicate_check.Active) {                    
> >                         collection.Add (photo);
> > +                } else {
> > +                    //Console.WriteLine ("Removing duplicate ...");
> > +                    store.Remove (photo);
> > +                }
> 
> Speed on import is important because it is one of the first things
> people will deal with in f-spot and is potentially very long running.
> With that in mind I have a couple of questions.  Why are you calling
> IsDuplicate twice here, it is potentially quite slow and could trivially
> be called only once, unless I'm missing something important.  I'll have
> more to say about IsDuplicate later.
> 

It is really strange how I finally end up with this code here. It is
cleary a bug not to include the first code if condition for
FSpot.DuplicatesFinder.IsDuplicate in the second block. I have done it
right now.

I have tried to check carefully the additional time we use to compute
the MD5. As you said, the first impression of the user from F-Spot will
be importing photos. If it is a very slow process, we start creating a
bad feeling in the user, a very bad thing.

> In the interest of speed it might also make sense to calculate the
> md5sum on the file before adding it to the store so that we could avoid
> two writes to the db.  Also there is no point in adding the the
> duplicate tag or the md5 if we aren't saving the photo so we can avoid
> two more writes in that case
> 

Sure, we can save some db writes following this way.


> something like:
> 
> 		ongoing = importer.Step (out photo, out thumbnail, out count);
> 		
> 		if (thumbnail == null) {
> 			Console.WriteLine ("Could not import file");
> 		} else {
> 			photo.MD5Sum = FSpot.DuplicatesFinder.PhotoMD5 (photo);			
> 
> 			if (FSpot.DuplicatesFinder.IsDuplicate (photo)) {
> 				duplicates++;
> 				if (duplicate_check.Active) {                    
> 					photo.AddTag (MainWindow.Toplevel.Database.Tags.Duplicate);
> 					store.Commit (photo);
> 					collection.Add (photo);

I have done this change to the code so we avoid the:

photo.AddTag (MainWindow.Toplevel.Database.Tags.Duplicate);
store.Commit (photo);

if the duplicates checkbox isn't active.



> 				} else {
> 					Console.WriteLine ("Removing duplicate ...");
> 					store.Remove (photo);
> 				}
> 			} else {                
> 				Console.WriteLine ("Importing Photo: {0}", photo.Name);
> 				collection.Add (photo);
> 			}
> 
> 			//grid.AddThumbnail (thumbnail);
> 			if (duplicate_check.Active)
> 				UpdateProgressBar (count, total, 0);
> 			else
> 				UpdateProgressBar (count, total, duplicates);
> 
> 			thumbnail.Dispose ();
> 		}
> 
> Seems like a good idea if we aren't going to explicitly allow you to add
> and remove duplicates in the import dialog.  If we do allow that then
> tagging the photos as duplicates is good but the logic will also need to
> change.


Currently you can check or uncheck the import duplicates checkbox. But
once you do it, the import process restart from zero, like it is done
with include subdirectories checkbox.

> 
> 
> > +            } else {                
> > +                //Console.WriteLine ("Importing Photo: {0}", photo.Name);
> > +                collection.Add (photo);
> > +            }
> >                 
> >                         //grid.AddThumbnail (thumbnail);
> > -                       UpdateProgressBar (count, total);
> > +            if (duplicate_check.Active) {
> > +                UpdateProgressBar (count, total, 0);
> > +            } else {
> > +                UpdateProgressBar (count, total, duplicates);
> > +            }
> >                         thumbnail.Dispose ();
> >                 }
> >  
> > +        long endTime = DateTime.Now.Ticks;
> > +        total_import += endTime - startTime;
> > +        TimeSpan timeTaken = new TimeSpan(endTime - startTime);
> > +        //Console.WriteLine ("Total time importing the photo {0} ({1} MD5)\n", 
> > +        //                   timeTaken.ToString(), timeTakenMD5.ToString ());        
> > +
> >                 if (ongoing && totaol > 0)
> >                         return true;
> >                 else {
> > @@ -473,8 +526,9 @@ public class ImportCommand : FSpot.Glade
> >                 this.importer = imp;
> >                 AllowFinish = false;
> >  
> > +        duplicates = 0;
> >                 total = importer.Prepare ();
> > -               UpdateProgressBar (0, total);
> > +               UpdateProgressBar (0, total, 0);
> >                 
> >                 collection.Clear ();
> >                 collection.Capacity = total;
> > @@ -492,10 +546,20 @@ public class ImportCommand : FSpot.Glade
> >                         }
> >                 }
> >  
> > +        TimeSpan timeTaken = new TimeSpan(total_import);
> > +        TimeSpan timeTakenMD5 = new TimeSpan(total_md5);
> > +        //Console.WriteLine ("Total time importing the photos {0} ({1} MD5)\n", 
> > +        //                   timeTaken.ToString(), timeTakenMD5.ToString ()); 
> > +
> >                 FSpot.ThumbnailGenerator.Default.PopBlock ();
> >                 
> > -               if (importer != null)
> > +               if (importer != null) {
> > +                       try { 
> >                         importer.Finish ();
> > +            } catch (System.Exception e) {
> > +                Console.WriteLine ("Exception finishing import ..." + e);
> > +            }
> > +        }
> >                 
> >                 importer = null;
> >  
> > @@ -619,6 +683,15 @@ public class ImportCommand : FSpot.Glade
> >                 this.Start ();
> >         }
> >  
> > +    
> > +    private void HandleDuplicateToggled (object sender, System.EventArgs args)
> > +       {
> > +               this.Cancel ();
> > +               while (Application.EventsPending ())
> > +                       Application.RunIteration ();
> > +               this.Start ();
> > +       }
> > +
> >         public int ImportFromFile (PhotoStore store, string path)
> >         {
> >                 this.store = store;
> > @@ -643,6 +716,7 @@ public class ImportCommand : FSpot.Glade
> >                 AllowFinish = false;
> >                 
> >                 recurse_check.Toggled += HandleRecurseToggled;
> > +        duplicate_check.Toggled += HandleRecurseToggled;
> >  
> >                 tag_option_menu.Menu = tagmenu;
> >                 SourceMenu menu = new SourceMenu ();
> > @@ -786,7 +860,7 @@ public class ImportCommand : FSpot.Glade
> >         {
> >                 Db db = new Db (db_path, true);
> >  
> > -               ImportCommand command = new ImportCommand ();
> > +               ImportCommand command = new ImportCommand (main_window);
> >  
> >                 command.ImportFromPath (db.Photos, directory_path, true);
> >

I have corrected the coding style in all the file using the correct tab
size. I have commented out all the code to compute MD5 performance.
So it seems we have finished reviewing ImportCommand.cs. 


>   
> > Index: src/MainWindow.cs
> > ===================================================================
> > RCS file: /cvs/gnome/f-spot/src/MainWindow.cs,v
> > retrieving revision 1.218
> > diff -u -b -B -p -r1.218 MainWindow.cs
> > --- src/MainWindow.cs   2 Oct 2005 07:13:38 -0000       1.218
> > +++ src/MainWindow.cs   4 Oct 2005 04:27:22 -0000
> > @@ -13,10 +13,11 @@ using System.Text.RegularExpressions;
> >  using LibGPhoto2;
> >  
> >  public class MainWindow {
> > -        public static MainWindow Toplevel;
> > +    public static MainWindow Toplevel = null;
> >  
> >         Db db;
> >  
> > +       
> >         TagSelectionWidget tag_selection_widget;
> >         [Glade.Widget] Gtk.Window main_window;
> >         [Glade.Widget] Gtk.VBox left_vbox;
> > @@ -68,6 +69,8 @@ public class MainWindow {
> >         [Glade.Widget] MenuItem remove_tag;
> >         [Glade.Widget] MenuItem find_tag;
> >         
> > +    [Glade.Widget] MenuItem find_duplicates;
> > +       
> >         [Glade.Widget] Scale zoom_scale;
> >  
> >         [Glade.Widget] VPaned info_vpaned;
> > @@ -1686,6 +1689,28 @@ public class MainWindow {
> >         void HandleClearDateRange (object sender, EventArgs args) {
> >                 query.Range = null;
> >         }
> > +
> > +       void HandleFindDuplicates (object sender, EventArgs args) {
> > +        find_duplicates.Sensitive = false;
> > +        FSpot.DuplicatesFinder finder = new FSpot.DuplicatesFinder (db, query);
> > +        finder.CreateDuplicateTag ();
> > +        tag_selection_widget.Update ();
> > +        if (!PhotoSelectionActive()) {
> > +            icon_view.SelectAllCells ();
> > +        }
> > +        System.Console.WriteLine ("Looking for duplicates ...");
> > +        finder.SearchFinished += new FSpot.DuplicatesFinderEnd (HandleEndDuplicates);
> > +        finder.startFind (SelectedIds());
> > +    }
> > +
> 
> I don't like selecting all cells here, the other methods that do similar
> things leave the selection alone.  Copy the query array or pass null and
> handle null specially in the find method.

Ok, I have done it thinking in the way the user will expect things to
happen. But yes, trying to suppose things for the user could be a bad
thing and if in other places of F-Spot there is a clear convention we
should follow it here also.

public void startFind (int [] selected_ids) {
            this.selected_ids = selected_ids;
    if (selected_ids.Length == 0) {
    return;
    }

> 
> > +    void HandleEndDuplicates (Boolean success) {
> > +        if (success) {
> > +            tag_selection_widget.Select (db.Tags.Duplicate);
> > +            UpdateQuery ();
> > +            SetViewMode (ModeType.IconView);
> > +        }
> > +        find_duplicates.Sensitive = true;
> > +    }
> >  
> >         // Version Id updates.
> >

I have also corrected the coding style here

>   
> > Index: src/Makefile.am
> > ===================================================================
> > RCS file: /cvs/gnome/f-spot/src/Makefile.am,v
> > retrieving revision 1.43
> > diff -u -b -B -p -r1.43 Makefile.am
> > --- src/Makefile.am     29 Sep 2005 10:37:09 -0000      1.43
> > +++ src/Makefile.am     4 Oct 2005 04:27:22 -0000
> > @@ -16,6 +16,7 @@ F_SPOT_CSDISTFILES =                          \
> >         $(srcdir)/Delay.cs                      \
> >         $(srcdir)/DirectoryAdaptor.cs           \
> >         $(srcdir)/DirectoryCollection.cs        \
> > +       $(srcdir)/DuplicatesFinder.cs   \
> >         $(srcdir)/Exif.cs                       \
> >         $(srcdir)/ExifUtils.cs                  \
> >         $(srcdir)/FlickrExport.cs               \
> > Index: src/PhotoStore.cs
> > ===================================================================
> > RCS file: /cvs/gnome/f-spot/src/PhotoStore.cs,v
> > retrieving revision 1.71
> > diff -u -b -B -p -r1.71 PhotoStore.cs
> > --- src/PhotoStore.cs   1 Sep 2005 10:01:52 -0000       1.71
> > +++ src/PhotoStore.cs   4 Oct 2005 04:27:22 -0000
> > @@ -28,6 +28,7 @@ public class Photo : DbItem, IComparable
> >                 return Compare (this, photo);
> >         }
> >         
> > +    // FIXME: With md5sum field this could be easy
> >         public static int Compare (Photo photo1, Photo photo2)
> >         {
> >                 int result = photo1.Id.CompareTo (photo2.Id);
> > @@ -137,6 +138,16 @@ public class Photo : DbItem, IComparable
> >                 }
> >         }
> >  
> > +    private string md5sum;
> > +    public string MD5Sum {
> > +               get {
> > +                       return md5sum;
> > +               }
> > +               set {
> > +                       md5sum = value;
> > +               }
> > +       }
> > +
> >         // Version management
> >         public const int OriginalVersionId = 1;
> >         private uint highest_version_id;
> > @@ -393,6 +404,7 @@ public class Photo : DbItem, IComparable
> >                 this.name = name;
> >  
> >                 description = "";
> > +        md5sum      = "";
> >  
> >                 // Note that the original version is never stored in the photo_versions table in the
> >                 // database.
> > @@ -493,7 +505,8 @@ public class PhotoStore : DbStore {
> >                         "       directory_path     STRING NOT NULL,                " +
> >                         "       name               STRING NOT NULL,                " +
> >                         "       description        TEXT NOT NULL,                  " +
> > -                       "       default_version_id INTEGER NOT NULL                " +
> > +                       "       default_version_id INTEGER NOT NULL,                 " +
> > +            "       md5sum             STRING NOT NULL                  " +
> >                         ")";
> >  
> >                 command.ExecuteNonQuery ();
> > @@ -538,8 +551,8 @@ public class PhotoStore : DbStore {
> >                 command.Connection = Connection;
> >  
> >                 command.CommandText = String.Format ("INSERT INTO photos (time, " +
> > -                                                    "directory_path, name, description, default_version_id) " +
> > -                                                    "       VALUES ({0}, '{1}', '{2}', '{3}', {4})                                       ",
> > +                                                    "directory_path, name, description, default_version_id, md5sum) " +
> > +                                                    "       VALUES ({0}, '{1}', '{2}', '{3}', {4}, '')                                       ",
> >                                                      unix_time,
> >                                                      SqlString (System.IO.Path.GetDirectoryName (path)),
> >                                                      SqlString (System.IO.Path.GetFileName (path)),
> > @@ -728,7 +741,8 @@ public class PhotoStore : DbStore {
> >                                                      "       directory_path,                       " +
> >                                                      "       name,                                 " +
> >                                                      "       description,                          " +
> > -                                                    "       default_version_id                    " +
> > +                                                    "       default_version_id,                   " +
> > +                             "       md5sum                                " +
> >                                                      "     FROM photos                             " +
> >                                                      "     WHERE id = {0}                          ",
> >                                                      id);
> > @@ -742,6 +756,7 @@ public class PhotoStore : DbStore {
> >  
> >                         photo.Description = reader[3].ToString ();
> >                         photo.DefaultVersionId = Convert.ToUInt32 (reader[4]);
> > +            photo.MD5Sum = reader[5].ToString ();
> >                         AddToCache (photo);
> >                 }
> >  
> > @@ -895,11 +910,13 @@ public class PhotoStore : DbStore {
> >  
> >                 SqliteCommand command = new SqliteCommand ();
> >                 command.Connection = Connection;
> > -               command.CommandText = String.Format ("UPDATE photos SET description = '{0}',     " +
> > -                                                    "                  default_version_id = {1} " +
> > -                                                    "              WHERE id = {2}",
> > +               command.CommandText = String.Format ("UPDATE photos SET description = '{0}'," +
> > +                                                    "                default_version_id = {1}," +
> > +                             "                md5sum = '{2}'"+                
> > +                                                    "                WHERE id = {3}",
> >                                                      SqlString (photo.Description),
> >                                                      photo.DefaultVersionId,
> > +                             SqlString (photo.MD5Sum),  
> >                                                      photo.Id);
> >                 command.ExecuteNonQuery ();
> >                 command.Dispose ();
> > @@ -913,6 +930,7 @@ public class PhotoStore : DbStore {
> >                 command.Dispose ();
> >  
> >                 foreach (Tag tag in photo.Tags) {
> > +            
> >                         command = new SqliteCommand ();
> >                         command.Connection = Connection;
> >                         command.CommandText = String.Format ("INSERT INTO photo_tags (photo_id, tag_id) " +
> > @@ -1004,6 +1022,7 @@ public class PhotoStore : DbStore {
> >                                 
> >                                 photo.Description = reader[4].ToString ();
> >                                 photo.DefaultVersionId = Convert.ToUInt32 (reader[5]);           
> > +                photo.MD5Sum = reader[6].ToString ();
> >                                 
> >                                 version_list.Add (photo);
> >                         }
> > @@ -1043,7 +1062,8 @@ public class PhotoStore : DbStore {
> >                                                      "       photos.directory_path,              " +
> >                                                      "       photos.name,                        " +
> >                                                      "       photos.description,                 " +
> > -                                                    "       photos.default_version_id           " +
> > +                                                    "       photos.default_version_id,          " +
> > +                             "       photos.md5sum                       " +
> >                                                      "     FROM photos                           " +
> >                                                      "     WHERE directory_path = \"{0}\"", dir.FullName);
> >  
> > @@ -1084,7 +1104,8 @@ public class PhotoStore : DbStore {
> >                                       "       photos.directory_path,              " +
> >                                       "       photos.name,                        " +
> >                                       "       photos.description,                 " +
> > -                                     "       photos.default_version_id           " +
> > +                                     "       photos.default_version_id,          " +
> > +                      "       photos.md5sum                       " +
> >                                       "     FROM photos                      ");
> >                 
> >                 if (range != null) {
> 
> At this point we really need a way to properly migrate the db for users
> there are real people that have real data in f-spot now.  So we need to
> detect and change the table at load time, which probably means walking
> all the photos and adding a sum.  I mind less if the upgrade takes some
> time than if the f-spot crashes with the old db format.
> 

Ok, I haven't done it because the README file about <<database is still
changing and the code doesn't handle upgrading existing tables when the
schema changes>>

But I feel the same as you here. We should migrate the all database
format to the new format and from now, change the README file if we plan
to support the current database scheme so users could start investing
time in filling their photos database.

I see here a clear stopper for the patch until this upgrade code is
done, no? I will do it but don't know exactly the time frame.


> Other than that nothing much to mention here other than the obvious
> formatting issues.
> 

Yes, I have modified the formatting issues.

> > 
> > Index: src/TagStore.cs
> > ===================================================================
> > RCS file: /cvs/gnome/f-spot/src/TagStore.cs,v
> > retrieving revision 1.19
> > diff -u -b -B -p -r1.19 TagStore.cs
> > --- src/TagStore.cs     27 Sep 2005 03:15:05 -0000      1.19
> > +++ src/TagStore.cs     4 Oct 2005 04:27:22 -0000
> > @@ -257,6 +257,16 @@ public class TagStore : DbStore {
> >                 }
> >         }
> >  
> > +    private Tag duplicate;
> > +       public Tag Duplicate {
> > +               get {
> > +                       return duplicate;
> > +               }
> > +        set {
> > +            duplicate = value;
> > +        }
> > +       }
> > +
> >         // In this store we keep all the items (i.e. the tags) in memory at all times.  This is
> >         // mostly to simplify handling of the parent relationship between tags, but it also makes it
> >         // a little bit faster.  We achieve this by passing "true" as the cache_is_immortal to our
> > @@ -290,6 +300,9 @@ public class TagStore : DbStore {
> >                         
> >                         if (tag.Name == "Hidden")
> >                                 hidden = tag;
> > +
> > +            if (tag.Name == "Duplicate")
> > +                duplicate = tag;            
> >                 }
> >  
> >                 reader.Close ();
> 
> I'm not excited to have another "special" tag like hidden, but in this
> case it doesn't really make things much worse than the hidden tag.  I
> need to figure out a clean way to fix this.

Yes, I have the same problem thinking about this issue. But probably
solving the hidden issue will be the same than solving the duplicate
issue so for now, I finally decided to do it that way.

> 
> > @@ -354,6 +367,12 @@ public class TagStore : DbStore {
> >                 this.hidden = hidden_tag;
> >                 Commit (hidden_tag);
> >  
> > +        Tag duplicate_tag = CreateTag (RootCategory, "Duplicate");
> > +               duplicate_tag.StockIconName = "f-spot-hidden.png";
> > +               duplicate_tag.SortPriority = -11;
> > +               this.duplicate = duplicate_tag;
> > +               Commit (duplicate_tag);
> > +
> >                 Tag people_category = CreateCategory (RootCategory, "People");
> >                 people_category.StockIconName = "f-spot-people.png";
> >                 people_category.SortPriority = -8;
> > @@ -455,6 +474,11 @@ public class TagStore : DbStore {
> >                     category.Children.Length > 0)
> >                         throw new InvalidTagOperationException (category, "Cannot remove category that contains children");
> >  
> > +        // FIXME: Hack!
> > +        if (string.Compare (((Tag) item).Name, "Duplicate") == 0) {
> > +            duplicate = null;
> > +        }
> > +        
> >                 RemoveFromCache (item);
> >                 
> >                 ((Tag)item).Category = null;
> > Index: src/f-spot.glade
> > ===================================================================
> > RCS file: /cvs/gnome/f-spot/src/f-spot.glade,v
> 
> 
> The only comment I have on the glade changes are that I think having the
> duplicate check inline with the other check is probably a little
> visually confusing.

I have tried to reuse as much F-Spot GUI desing as possible but maybe I
have missed some GNOME HIG details in some point. About the visual
confusing with the new option maybe we can add more space between
options or align the new option to the right. Other option is to use
some kind of separator like a vertical bar, but I hate to do this kind
of things because the visual cluttering they introduce.

Some comment from GUI designers?

> 
> On to the included file

DuplicatesFinder.cs. The first I think I have done is to reformat all
the file to use the 8 white spaces tab.

> 
> 
> > using System;
> > using System.Collections;
> > using System.IO;
> > using System.Security.Cryptography;
> > using System.Text;
> > 
> > namespace FSpot {
> >     public delegate void DuplicatesFinderEnd (Boolean success);
> > 
> >     public class DuplicatesFinder {
> >         private System.Threading.Thread    duplicates_thread = null;
> >         private FSpot.ThreadProgressDialog progress_dialog_duplicates;
> >         private ArrayList                  duplicates = null;
> >         private Boolean                    end_duplicates = false;
> >         private uint                       duplicates_timer = 0;
> >         private System.Threading.Thread    md5_thread = null;
> >         private FSpot.ThreadProgressDialog progress_dialog_md5;
> >         private Boolean                    end_computeMD5 = false;
> >         private uint                       computeMD5_timer = 0;
> > 
> >         // Shared data variables we need to work with
> >         private Db                         db;
> >         private int[]                      selected_ids;
> >         private FSpot.PhotoQuery           query;
> >     
> >         // To inform that the Finder has finished
> >         public event DuplicatesFinderEnd   SearchFinished;
> >     
> 
> Did I already mention I don't like to have the variable names aligned?
> All to often adding a variable means needless whitespace changes when
> you do this, and I don't see much benefit.

I will change it. I think the code is more readable and I have inherited
it from my work in Planner. But I change it and I will use the one white
space between the type and the variable name.

Just an example, try to read:

private System.Threading.Thread    duplicates_thread = null;
private FSpot.ThreadProgressDialog progress_dialog_duplicates;
private ArrayList                  duplicates = null;
private Boolean                    end_duplicates = false;
private uint                       duplicates_timer = 0;
private System.Threading.Thread    md5_thread = null;
private FSpot.ThreadProgressDialog progress_dialog_md5;
private Boolean                    end_computeMD5 = false;
private uint                       computeMD5_timer = 0;

and now

private System.Threading.Thread duplicates_thread = null;
private FSpot.ThreadProgressDialog progress_dialog_duplicates;
private ArrayList duplicates = null;
private Boolean end_duplicates = false;
private uint duplicates_timer = 0;
private System.Threading.Thread md5_thread = null;
private FSpot.ThreadProgressDialog progress_dialog_md5;
private Boolean end_computeMD5 = false;
private uint computeMD5_timer = 0;

> 
> >         public DuplicatesFinder (Db                 db, 
> >                                  FSpot.PhotoQuery   query)
> > 
> >         {
> >             this.db = db;
> >             this.query = query;
> >         }
> 
> 
> The previous comment goes double for method parameters.  It is ok to
> wrap them, but don't align them.
> 

Ok, done!

> > 
> >         public void CreateDuplicateTag () {
> >             Tag tag_duplicate = db.Tags.Duplicate;
> >             if (tag_duplicate == null) {
> >                 Console.WriteLine ("Creating the Duplicate tag ...");
> >                 tag_duplicate = db.Tags.CreateTag (null, "Duplicate");
> >                 db.Tags.Duplicate = tag_duplicate;
> >                 tag_duplicate.StockIconName = "f-spot-hidden.png";
> >                 tag_duplicate.SortPriority = -11;
> >                 db.Tags.Commit (tag_duplicate);
> >             } else {
> >                 Console.WriteLine ("The duplicate tag already exists ...");
> >             }
> >         }
> > 
> >         public void startFind (int [] selected_ids) {
> >             this.selected_ids = selected_ids;
> >             md5_thread = 
> >                 new System.Threading.Thread (new System.Threading.ThreadStart (this.ComputeMD5));
> >             md5_thread.Name = Mono.Posix.Catalog.GetString ("Creating image unique identifiers");
> >             progress_dialog_md5 = new 
> >                 FSpot.ThreadProgressDialog (md5_thread, selected_ids.Length);
> >             progress_dialog_md5.Start();
> >             StartComputeMD5Timer ();
> >         }
> > 
> 
> I'd like all method names to be PascalCase not camelCase.
> 

Sure, this is a coding style bug.


> >         private bool HandleDuplicatesTimer ()
> >         {
> >             Console.WriteLine ("Duplicates Timer ...");
> >             if (end_duplicates || !duplicates_thread.IsAlive) {
> >                 if (db.Tags.Duplicate == null) {
> >                     CreateDuplicateTag ();
> >                 }
> >                 end_duplicates = false;
> >                 duplicates_timer = 0;
> >                 foreach (int num in duplicates) {
> >                     Console.WriteLine ("Tagging duplicate photo {0}", num);
> >                     query.Photos[num].AddTag (db.Tags.Duplicate);
> >                     query.Commit (num);
> >                 }
> >                 if (SearchFinished != null) {
> >                     SearchFinished (true);
> >                 }
> >                 return false;
> >             }
> >             return true;
> >         }
> > 
> >         private void StartDuplicatesTimer ()
> >         {
> >             if (duplicates_timer == 0)
> >                 duplicates_timer = 
> >                     GLib.Timeout.Add (100, new GLib.TimeoutHandler (HandleDuplicatesTimer));
> >         }
> > 
> >         private bool HandleComputeMD5Timer ()
> >         {
> >             Console.WriteLine ("MD5 Timer ...");
> >             if (md5_thread.IsAlive) {
> >                 Console.WriteLine ("The MD5 thread is alive ...");
> >             }
> >             if (end_computeMD5 || !md5_thread.IsAlive) {
> >                 if (progress_dialog_md5 != null) {
> >                     progress_dialog_md5.Destroy ();
> >                 }
> >                 computeMD5_timer = 0;
> >                 if (end_computeMD5) {
> >                     end_computeMD5 = false;
> >                     /* duplicates_thread = new System.Threading.Thread 
> >                         (new System.Threading.ThreadStart (this.FindDuplicates));
> >                     duplicates_thread.Name = Mono.Posix.Catalog.GetString ("Finding Duplicates");
> >                     progress_dialog_duplicates = 
> >                     new FSpot.ThreadProgressDialog (duplicates_thread, selected_ids.Length);
> >                     progress_dialog_duplicates.Start (); */
> >                     StartDuplicatesTimer (); 
> >                     lock (this) {
> >                         end_duplicates = true;
> >                     }
> >                 } else {
> >                     if (SearchFinished != null) {
> >                         SearchFinished (false);
> >                     }
> >                 }
> >                 return false;
> >             }
> >             return true;
> >         }
> > 
> >         private void StartComputeMD5Timer ()
> >         {
> >             if (computeMD5_timer == 0)
> >                 computeMD5_timer = 
> >                     GLib.Timeout.Add (1000, new GLib.TimeoutHandler (HandleComputeMD5Timer));
> >         }
> > 
> > 
> >         public static string PhotoMD5 (Photo photo) {
> >             FileStream fs = new FileStream(photo.Path, FileMode.Open, FileAccess.Read);
> >             MD5 md5ServiceProvider = new MD5CryptoServiceProvider();
> >             byte[] md5 = md5ServiceProvider.ComputeHash(fs);
> >             
> >             StringBuilder hash = new StringBuilder();
> >             for (int pos = 0; pos < md5.Length; pos++) {
> >                 hash.Append(md5[pos].ToString("X2").ToLower());
> >             }
> >             return hash.ToString ();
> >         }
> > 
> 
> Is there any advantage to using a string rather than the byte array?  It
> uses 4 times as much memory per hash, and since the strings would
> typically be unique you don't gain anything.  We could still store them
> in the db as a string.


Ok, you yo are thinking in storing in memory the hash value and when it
is time to save it to the database, store them as a string, no? I think
we have then the same problem when we load the photo data from the
database because the hash will be a string. We can then convert it to
the byte[] structure.  I understand that if we have thousands of photos
in memory this issue could be important.

So this is going to be the second TODO (the fist is the add code to
upgrade the database schema).


> 
> Also Use a using block around around the FileStream and probably use
> System.IO.File.OpenRead.

Done! Yes it is a good idea to force the Dispose call in order to close
the File as soon as possible.

> 
> >         /* FIXME: We should use a cache of all photos MD5? */
> >         public static bool IsDuplicate (Photo photo) {
> >             Photo[] photos;
> >             photos = MainWindow.Toplevel.Database.Photos.Query (null, null);
> >             if (photos == null) {
> >                 Console.WriteLine ("No photos in duplicates ...");
> >                 return false;
> >             }
> >             if (photo.MD5Sum == null) {
> >                 photo.MD5Sum = PhotoMD5 (photo);
> >             }
> >             bool  duplicate = false;
> >             foreach (Photo p in photos) {
> >                 if (p.MD5Sum == null) {
> >                     p.MD5Sum = PhotoMD5 (p);
> >                 }
> >                 //Console.WriteLine ("Looking: {0},{1} {2},{3}", 
> >                 //                   p.Name, p.MD5Sum, photo.Name, photo.MD5Sum);
> >                 if (p.MD5Sum.CompareTo(photo.MD5Sum) == 0 &&
> >                     p.Id != photo.Id) {
> >                     //Console.WriteLine ("DUPLICATES");
> >                     duplicate = true;
> >                     break;
> >                 }
> >             }            
> >             return duplicate;
> >         } 
> > 
> 
> Ok, this looks like something we definitely want to avoid and is also
> part of the reason why I wanted to reduce the two calls to IsDuplicate
> to one.  First we query for all the photos in the database then we walk
> over all the items.  The database is potentially large (40,000 photos in
> Nat's case) and this walks in first in the query then in the in the
> foreach.  You are storing the hash in the db why don't you just add a
> query for the hash?

Yes, good point. So a SQL query to the data in the database trying to
find a duplicate could avoid us this loop. 

To implemente the query to the database I have added a new public method
to PhotoStore in order to do the query to the database. In
DuplicatesFinder we don't work with the database directly and it is very
natural to include this method in PhotoStore.

public bool IsDuplicate (Photo photo) {
		bool duplicate = false;

		if (photo.MD5Sum == null) {
			photo.MD5Sum = FSpot.DuplicatesFinder.PhotoMD5 (photo);
		}

		SqliteCommand command = new SqliteCommand ();
		command.Connection = Connection;
		command.CommandText = String.Format ("SELECT id FROM photos WHERE
md5sum = '{0}'", photo.MD5Sum);
		SqliteDataReader reader = command.ExecuteReader ();
		if (reader.Read ()) {
			duplicate = true;
		} 
		command.Dispose ();
		return duplicate;
	}



> 
> >         // Executed in a different thread
> >         private void ComputeMD5 () {
> >             // A really simple implementation: Build hash table with MD5 of images
> >             // and we later use it to find duplicates. If the MD5 is in the hash table
> >             // we don't add it. We have only the "original" in this table.
> >             Hashtable photos_md5 = new Hashtable ();
> >             duplicates = new ArrayList ();
> > 
> >             int counter = 1;
> >             foreach (int num in selected_ids) {
> >                 Photo photo = query.Photos [num];
> >                 progress_dialog_md5.Message = System.String.Format 
> >                     (Mono.Posix.Catalog.GetString ("Creating unique identifier for {0}"), photo.Name);
> >                 progress_dialog_md5.Fraction = counter / (double) selected_ids.Length;
> >                 progress_dialog_md5.ProgressText = System.String.Format (Mono.Posix.Catalog.GetString ("{0} of {1}"), counter, selected_ids.Length);
> >                 if (string.Compare(photo.MD5Sum, "") != 0) {
> >                     // It is a duplicate. We should mark it.
> >                     if (photos_md5[photo.MD5Sum] != null) {
> >                         duplicates.Add (num);
> >                     } else {
> >                         photos_md5.Add (photo.MD5Sum, num);
> >                     }
> >                     counter++;
> >                     continue;
> >                 }
> >                 // Computing time is measured for testing purposes
> >                 try {
> >                     long startTime = DateTime.Now.Ticks;
> >                     FileStream fs = new FileStream(photo.Path, FileMode.Open, FileAccess.Read);
> >                     MD5 md5ServiceProvider = new MD5CryptoServiceProvider();
> >                     byte[] md5 = md5ServiceProvider.ComputeHash(fs);
> > 
> >                     StringBuilder hash = new StringBuilder();
> >                     for (int pos = 0; pos < md5.Length; pos++) {
> >                         hash.Append(md5[pos].ToString("X2").ToLower());
> >                     }
> >                     long endTime = DateTime.Now.Ticks;
> >                     TimeSpan timeTaken = new TimeSpan(endTime - startTime);
> >                     Console.WriteLine("MD5 compute: {0}", timeTaken.ToString());
> >                     photo.MD5Sum = hash.ToString ();
> >                     db.Photos.Commit (photo);
> >                     counter++;                    
> >                     // It is a duplicate. We should mark it.
> >                     if (photos_md5[photo.MD5Sum] != null) {
> >                         duplicates.Add (num);
> >                     } else {
> >                         photos_md5.Add (photo.MD5Sum, num);
> >                     }
> >                 } catch (Exception ex) {
> >                     Console.WriteLine ("Problems creating MD5: {0} ", photo.Path);
> >                     Console.WriteLine (ex);
> >                 }
> >             }
> >             progress_dialog_md5.Message = Mono.Posix.Catalog.GetString ("Done Creating Unique Identifiers for Photos");
> >             progress_dialog_md5.Fraction = 1.0;
> >             progress_dialog_md5.ProgressText = Mono.Posix.Catalog.GetString ("All unique identifiers created.");
> >             progress_dialog_md5.ButtonLabel = Gtk.Stock.Ok;
> >             lock (this) {
> >                 end_computeMD5 = true;
> >             }
> >         }
> >     }
> 
> This contains another implementation of the PhotoMD5 method, use the one
> you already have above (once it is fixed).  Also this might be a
> candidate for using the query as well, at least in the nothing is
> selected case.

Yes, this is the first implementation I have done when we aren't
creating the MD5Sum in import. I think this should be cleaned out if the
MD5Sum is computed in importing so all this code is removed right now in
my CVS.

> 
> Now that I'm here at the end I'm not sure what the point of the
> duplicate tag is.  Is it there for a general purpose or are you planning
> to just use it in the import dialog.  If it is the later I think we can
> use a different trick.  If it is the former, what are your plans?  Also
> I notice the duplicate detection doesn't take versions into account.
> You probably need to store a sum on each version as well and check them
> in all the tests (ouch that adds more complications).  At least this
> would be a good time to add the directory path to the version table
> which is badly needed for other things.

Ok, we are right now in an important and historical point about the
patch :)

The first time I started working in the patch my target was to add
duplicate support for already imported photos. So with this in mind, I
created the "Find Duplicates" entry and all the dialogs that informed
the user about the MD5 creation for photos. I found that the best way to
show the user the duplicates she has is to use the actual search system
in F-Spot: using tags. So it is very natural to find the duplicates
searching for the tag Duplicate.

Talking about the duplicates feature in the list it was concluded that
the really needed feature in the first time is to detect duplicates in
import so I implemented it in the importing phase. If we compute the MD5
in importing then all the dialogs showing the MD5 computation later in
the Find Duplicates menu entry aren't needed and the DuplicatesFinder.cs
class is more and more light and I feel that we could remove it. I bet
that the final patch should be:

- Support for duplicates in importing
- If importing duplicates, tag the photos
- Use the duplicate tag to find the duplicates

I think that sometimes people will import duplicates (for example a
database created before the duplicate feature was added or a mistake
importing) and we have to give them a way to clean them after. The tag
duplicate could be the solution. We can remove the "Find duplicates"
menu entry maybe. I understand that another tag makes F-Spot more
complex and the user will need more mental effort to use the tool. If
you think is better not to us include this support, it is very easy to
remove the tagging for duplicates this time and include the patch
without this support at all.

>  
> 
> Anyway, the patch is awesome and solves a real problem and would be
> great to get in,  but there is a bit of refactoring and cleanup that
> still needs to be done.  I'm really sorry I haven't given it a proper
> review before now.  I hope this at least clarifies why I didn't think it
> was quite ready.
> 

Sure! It is normal the first time a new coder want to contribute to a
project that the project maintainer gives some rules to follow. It is
great that now I can follow them and start working closer for the near
future.


> Thank you so much for all the hard work and time it must have taken to
> get things this far, I'd love to solve the remaining issues as fast as
> possible and move on to the other patches you have waiting.
> 

Sure, I think we can resolver all the pending issues. The remaining
issues are:

1. We should include the upgrade database scheme code in order the patch
is included? If any case I will work on that but sooner or later
depending the answer.

2. The MD5Sum should be stored in memory as an array bit and converted
when reading as a string from the database and writing as a string to
the database. Right?

3. IsDuplicate right now is moved to PhotoStore.cs and queries the
database to find it the to be imported photo is a duplicate. Right?

4. We want duplicate photo detection after importing? If no I will clean
all the code about it. If yes, should we continue using the tag
duplicate? If maybe, I will clean the code in order to include the patch
upstream for importing and we should continue talking about it.


Ok, I think that's all folks! Now time to make some decissions and I
will send an updated patch to the list.

Cheers

> --Larry
> 
> 
> 
> _______________________________________________
> F-spot-list mailing list
> F-spot-list gnome org
> http://mail.gnome.org/mailman/listinfo/f-spot-list
> 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]