Re: JWZ threading question



Hi Jack!

On 12/22/2010 02:46:16 PM Wed, Jack wrote:
Good afternoon,

I generally select JWZ threading under the view menu, but I recently noticed a case where it does something odd.  (It is showing a message indented under a message sent months later.)  I won't necessarily call it wrong, but it groups messages that are unrelated except for having the same subject.  Basically, I have several messages from some months ago, which threaded fine, all with the subject "Re: [Kmymoney2-developer] Feedback" (note I had deleted the original, so they ALL have the identical subject including the "Re:".

Today, I got a new message, unrelated to those, with subject "[Kmymoney2-developer] Feedback" which got threaded as the head of the old set of messages.  A reply to that message is correctly shown at the end of the thread.  I've attached a screen shot, but will try to show the state below,  with just subject and date.

Subject		Date
Topic		Dec 11
  Re: Topic	Sep 1
    Re: Topic	Sep 2
      Re: Topic	Sep 3
  Re: Topic	Dec 12

The first and last message really belong together, and the middle three belong together, but they really shouldn't ALL be together.  Simple threading does what I want - but there have been other cases where JWZ is better, for reasons stated by the original author of the algorithm.

My question:  is there any chance there is a problem in balsa's implementation of the JWZ threading algorithm?  (I have a copy of the original "pseudo code" description, but I have not yet read it carefully, nor have I yet looked at the balsa  code.)  If not, I'm tempted to send this case to the original author to see if he agrees it's odd, or why he thinks it's really the right thing to do.

Am I just being to picky about this, or could this be considered a (small) problem?

You're not being picky--it's a problem!

As I recall it, this behavior is part of JWZ's algorithm.  It first builds message trees based on "References:" headers, and then merges them based on "Subject:" headers.  I often send out messages with generic subjects like "Today's meeting", or reply to similar messages, and it all gets threaded incorrectly.

We could easily implement a stricter version that omits the "subject merge" phase.  I've hesitated, because having both "simple" and "JWZ" threading in addition to "flat index" is already confusing and opaque.  As far as I know, there's no short term for "thread by references".  Adding to the confusion, the IMAP protocol defines a "REFERENCES" threading method, which is in fact JWZ, *not* "thread by references".

Perhaps we should hijack "simple", redefining it to mean "thread by references".  Currently it looks at only the "In-reply-to:" header, and breaks up a thread when a message belonging to the thread is not in the mailbox.  Threading by references allows the algorithm to see past a missing parent, and create the closest approximation to the real thread that's possible, based on the available messages.  I can't think of a situation where "simple threading" gives a more useful threading than "thread by references".

Comments?

Peter

Attachment: pgpuatXm4abUV.pgp
Description: PGP signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]