[Shotwell] Better Duplicate Detection



Hey Jens, List,

Here's a patch for a more reliable duplicate detection.

Currently, duplicates are searched by any matching criteria provided.
Relevant here are only the full pic(ture data) hash and thumbnail
hash.

Let me make an example
Given two pics in the db with different pic-data but same thumb-data
(due to lossy compression of files with same sizes (or, god forbid, an
md5 collision))
pic1 pic-hash-x thumb-hash-y
pic2 pic-hash-z thumb-hash-y
A query for duplicates of pic-hash-x, thumb-hash-y would previously
have returned both pic1 and pic2 because of the same hash. With the
patch, when the full pic-hash is available, it will always be used
instead and the query will only return pic1.

This is still not perfect yet, as if there is a different pic3 without
associated pic-hash value (NULL in DB), the lesser evil is to still
compare for the thumb-hash
pic3 NULL thumb-hash-y

Patch:
https://github.com/abrauchli/shotwell/commit/f9222a979385a40988730713bc7a9caf96b2cef4

I'm not quite sure whether it also affects RAW file duplicates that
were discussed lately as I'm not familiar with the workflow for
RAW-type files.

Let me know if that's mergable.

Best,
Andreas


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]