Re: editing Subject: lines



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Jack,

On 09/18/2017 10:48:14 AM Mon, Jack wrote:
Motivation:  I have a mailing list I follow, where I collect all of the emails about bugs.  Most of there are 
to the list from the relevant bugzilla, but some are just messages to the list.  Because they don't all come 
from the same system, no threading system can properly group them all.  I've managed to make a copy of the 
maildir folder, and use sed to always put [Bug 67897954] before anything else in the subject like, Re:, Fw:, 
[listname], but there are some messages sed doesn't touch.  The biggest bunch of these are where the subject 
been encoded, for example:

Subject:
 =?UTF-8?B?W2tteW1vbmV5NF0gW0J1ZyAzMDY2OTJdIExhIGZlbsOqdHJlIGVzdCBwbHVz?=
 =?UTF-8?B?IGxhcmdlIHF1ZSBsJ8OpY3Jhbg==?=

Enough googling has now given me both perl and python routines to decode these, and I suppose I can use Perl 
instead of sed to do all the editing.  However, I'm also open to other suggestions on how to approach this.  
I can easily identify the specific files with this issue.  Some of the Subjects start on the same line, and 
some wrap as in the above example.  I'm not certain, but I think most of the UTF encoded subjects don't 
actually have any non-ascii characters, although a few certainly do.  I suppose I could replace all those 
lines to only UTF-8 encode those characters which need it instead of the whole line, and then my original 
approach to a regex replacement would work.

The simplest approach seems to be to decode the content of each encoded word, convert to UTF-8 if it isn't 
that already, and use whatever tool is most convenient to grep out the bug number or whatever else you need. 
You shouldn't need to RFC-2047-encode anything. Could that work?

Peter
-----BEGIN PGP SIGNATURE-----

iF0EARECAB0WIQS030wPRfNNA5alz3MfX9S1uSp09QUCWcAvNgAKCRAfX9S1uSp0
9VAeAJ98Krpy6g2/LloRI1R29dYi0v8M3QCfUOP6lV9amo1hEcv58oV13u1wRKU=
=oI/C
-----END PGP SIGNATURE-----


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]