editing Subject: lines



Motivation: I have a mailing list I follow, where I collect all of the emails about bugs. Most of there are to the list from the relevant bugzilla, but some are just messages to the list. Because they don't all come from the same system, no threading system can properly group them all. I've managed to make a copy of the maildir folder, and use sed to always put [Bug 67897954] before anything else in the subject like, Re:, Fw:, [listname], but there are some messages sed doesn't touch. The biggest bunch of these are where the subject been encoded, for example:

Subject:
=?UTF-8?B?W2tteW1vbmV5NF0gW0J1ZyAzMDY2OTJdIExhIGZlbsOqdHJlIGVzdCBwbHVz?=
 =?UTF-8?B?IGxhcmdlIHF1ZSBsJ8OpY3Jhbg==?=

Enough googling has now given me both perl and python routines to decode these, and I suppose I can use Perl instead of sed to do all the editing. However, I'm also open to other suggestions on how to approach this. I can easily identify the specific files with this issue. Some of the Subjects start on the same line, and some wrap as in the above example. I'm not certain, but I think most of the UTF encoded subjects don't actually have any non-ascii characters, although a few certainly do. I suppose I could replace all those lines to only UTF-8 encode those characters which need it instead of the whole line, and then my original approach to a regex replacement would work.

Note that I do NOT expect to do this within Balsa, although I certainly wouldn't mind. I also think I've discovered a potential bug in the JWZ threading algorithm, but I do not believe it is relevant to this issue. I'll post about that some other time, although I have a vague memory that I did that a long time ago. (The problem is inherent to the algorithm, not specific to Balsa's implementation of it. Quick version is that a message with subject "Subject" will always thread higher than any "Re: subject" even if the former has a much later date than the supposed replay. This often shows up in emails about, for example "Bug in latest version.")

Thanks for any ideas.

Jack


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]