Re: De-duplicate database
- From: Konstantin Maslov <conma yandex ru>
- To: Michael Schmarck <michael schmarck habmalnefrage de>
- Cc: f-spot-list gnome org
- Subject: Re: De-duplicate database
- Date: Tue, 19 Feb 2008 22:27:05 +0300
May be something like that:
select id from (select id, count(id) as cnt, uri from photos group by
uri) where cnt>1;
В Втр, 19/02/2008 в 08:11 +0100, Michael Schmarck пишет:
> Hello.
>
> I'm using f-spot to manage my photos. To do so, I copy the pictures
> from my camera to some folder on my harddisk. Then I use f-spot to
> import the pictures to f-spot. I *always* uncheck the "copy files
> to Photos folder", ie. they remain in a directory structure of my
> liking.
>
> Suppose I've got the images in ~/Desktop/My Pictures/Travel and
> ~/Desktop/My Pictures/Fun. I'd now import the Travel an Fun folders.
> Later on, I make some new pictures and copy them to Travel and/or
> Fun again. I'd now re-import the Travel and/or Fun folders.
>
> What now happens is, that f-spot will also re-import all the already
> known pictures. This is a known bug in f-spot -> http://bugzilla.gnome.org/169646
>
> Does anyone know of a way to (more or less...) easily de-dupe the
> f-spot database?
>
> Right now, I do it somewhat "awkward", like that:
>
> sqlite3 ~/.gnome2/f-spot/photos.db 'select uri from photos order by uri'|\
> uniq -d | while read uri; do
> printf 'select id from photos where uri = "%s";' "$uri" | \
> sqlite3 photos.db | sort -n | sed 1d
> done
>
> What this does, is that it reports all the "uris" from the photos table
> which exist multiple times (sqlite3 ... photos.db ... | uniq -d). Next,
> it queries the database again to determine the IDs of the dupes (that's
> what the while loop does). From what's returned, I chop of the 1st line
> (sed 1d), in the assumption that only the 1st/oldest entry should be
> kept in the database. The output of all of that is a list of IDs that
> could be deleted/dropped from photos and photo_tags and photo_versions.
>
> As I said, that's a bit "awkward" :)
>
> Does anyone know of a better way? Maybe a more clever SQL statement,
> so that it's not necessary to query the database that often?
>
> Thanks a lot!
>
> --
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]