Re: [Evolution] Duplicates removal



On Wed, 2003-07-23 at 18:10, Jeffrey Stedfast wrote:
On Wed, 2003-07-23 at 08:56, Denis O. Mikhalkin wrote:
Hi,

is there a way to set up Evolution to remove duplicate messages in a
folder?

no

 If not, can I propose such a feature?

it's been proposed multiple times, the problem is how to do it without
risking losing some of the users mail. 

- Message-Id headers are not guarenteed to be unique and so we can't
base duplicate message deletion on that

- We can't just compare headers either as mailing-list software will
often munge some ehaders and maybe even add a few of its own.

- We can't just compare the message body either, as mailing lists often
add their own footer. Might be possible to auto-detect these footers
when they are added to a text/plain body part, however I'm not sure it'd
be so easy if the message was a multipart. Some mailing-list software
actually correctly adds the signature as its own MIME part, some just
append the signature no matter what (so in the case of a multipart - the
signature becomes the multipart's postface)

could probably be done, but it would be a painfully slow operation
(comparing large bodies of text is just gonna be slow...)

or maybe we could strip the mailing-list signature and md5sum it and
cache that value...? even so, it seems like a lot of effort. when do we
remove th md5sum from the cache? never? when the message it belongs to
gets deleted? expunged? if never, that cache will get massive quickly
for some of us... if you remove md5sums when their message gets deleted,
this makes the backend significantly more complex :-(

the other issue that needs to be dealt with is that if you see
"downloading 200 messages" and all 200 are the exact same mail that you
already have downloaded, wouldn't you be a little frightened that you
lost mail if after the download completes, you have no new mail? most
people would probably freak out and think evo 'ate' their mail.

I suppose the easiest way to solve this would be to make it a menu item
to run manually...
Thoughts:
- surely it shouldn't be automatic, manual, with short-cut.

- the algorithm should be working that way so it doesn't delete
different messages. But if he misses some duplicates it is ok. It might
be based on headers I think, no body comparision is needed. If header is
changed - ok, we miss some duplicates.

- there is already successul implementation used by big number of users,
TheBat, which has "Kill Duplicates" action on folder. May be you can
just "import" the algorithm from it? It is described in its help system
in details. I was using it for a long time and I was satisfied on its
success/miss ratio.

Denis





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]