Duplicate detection in f-spot



Hi all,

Today i committed duplicate detection to f-spot.  It has been a long
standing issue [1] that can hopefully be resolved before the next
release.

How does it work?
------------------
Basically F-Spot detects duplicates by comparing md5 sums of the image
data.  When a new image is imported into f-spot, it checks whether an
image with the same md5 sum already exists in the photo database.  If
that is the case (and if you requested not to import duplicates), then
f-spot will skip the image, only importing the ones that are not yet in
your image library.

Because the md5 sum of an image is stored in the database, duplicate
detection is quite fast and not affected too much by the size of your
database or the number of photo's you're importing.

Updating
---------
The md5 value is actually stored in the database, therefore an update is
needed on existing f-spot databases.  We kept in mind that some people
have very large databases, and we didn't want to force them into a long
process of waiting before the db was completely upgraded.  Therefore
when you launch f-spot the first time with the duplicates patch, it will
change the db schema and create md5 sum creation jobs.   These will
gradually calculate md5 sums for all images already available in f-spot;
in the background, without disturbing the user.  Due to this, it make
take a while though before duplicate detection is fully operational.

Remarks
--------
Running the latest version of f-spot from svn will thus change your
database schema; we tested it on large databases [2] where the updata
happened without too much hassle, but be aware it might take some time.

If you want to test it before you run it against your full db, you can
always use the -b switch of course.

Best Regards,
Thomas

[1] http://bugzilla.gnome.org/show_bug.cgi?id=169646
[2] http://bugzilla.gnome.org/show_bug.cgi?id=169646#c70



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]