Re: [gmime-devel] Decoding UTF-8 headers
- From: Jeffrey Stedfast <fejj gnome org>
- To: Andrew Victor <avictor za gmail com>
- Cc: gmime-devel-list gnome org
- Subject: Re: [gmime-devel] Decoding UTF-8 headers
- Date: Sat, 16 Jul 2011 17:00:34 -0400
On 07/16/2011 04:01 PM, Andrew Victor wrote:
[snip]
Ok.
Is it not also an issue that 1 byte of a 2-byte UTF-8 "character"
(=C3=AD) is in one encoded-word, and the other byte in the next
encoded-word (next line)?
Ah, yes, I missed that. That's also "illegal". In this particular case,
however, it's the period that broke things. The fact that those words
are marked as being UTF-8 is probably the only reason it works if you
enable the WORKAROUND flag since there is this optimization in the decoder:
/* slight optimization? */
if (!g_ascii_strcasecmp (charset, "UTF-8")) {
p = (char *) decoded;
len = declen;
//while (!g_utf8_validate (p, len, (const char **) &p)) {
// len = declen - (p - (char *) decoded);
// *p = '?';
//}
return g_strndup ((char *) decoded, declen);
}
I believe that if I were to uncomment that while-loop, it would insert
question-marks into the decoded text - one for each of those split bytes.
Jeff
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]