Re: De-duplicate database

From: Konstantin Maslov <conma yandex ru>
To: Michael Schmarck <michael schmarck habmalnefrage de>
Cc: f-spot-list gnome org
Subject: Re: De-duplicate database
Date: Tue, 19 Feb 2008 22:27:05 +0300

May be something like that:
select id from (select id, count(id) as cnt, uri from photos group by
uri) where cnt>1;

В Втр, 19/02/2008 в 08:11 +0100, Michael Schmarck пишет:
> Hello.
> 
> I'm using f-spot to manage my photos. To do so, I copy the pictures
> from my camera to some folder on my harddisk. Then I use f-spot to
> import the pictures to f-spot. I *always* uncheck the "copy files
> to Photos folder", ie. they remain in a directory structure of my
> liking.
> 
> Suppose I've got the images in ~/Desktop/My Pictures/Travel and
> ~/Desktop/My Pictures/Fun. I'd now import the Travel an Fun folders.
> Later on, I make some new pictures and copy them to Travel and/or
> Fun again. I'd now re-import the Travel and/or Fun folders.
> 
> What now happens is, that f-spot will also re-import all the already
> known pictures. This is a known bug in f-spot -> http://bugzilla.gnome.org/169646
> 
> Does anyone know of a way to (more or less...) easily de-dupe the
> f-spot database?
> 
> Right now, I do it somewhat "awkward", like that:
> 
> sqlite3 ~/.gnome2/f-spot/photos.db 'select uri from photos order by uri'|\
>   uniq -d | while read uri; do 
>     printf 'select id from photos where uri = "%s";' "$uri" | \
>      sqlite3 photos.db | sort -n | sed 1d
>   done
> 
> What this does, is that it reports all the "uris" from the photos table
> which exist multiple times (sqlite3 ... photos.db ... | uniq -d). Next,
> it queries the database again to determine the IDs of the dupes (that's
> what the while loop does). From what's returned, I chop of the 1st line
> (sed 1d), in the assumption that only the 1st/oldest entry should be
> kept in the database. The output of all of that is a list of IDs that
> could be deleted/dropped from photos and photo_tags and photo_versions.
> 
> As I said, that's a bit "awkward" :)
> 
> Does anyone know of a better way? Maybe a more clever SQL statement,
> so that it's not necessary to query the database that often?
> 
> Thanks a lot!
> 
> --

References:
- De-duplicate database
  - From: Michael Schmarck

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]