Re: [evolution-patches] patch to revert the fix for bug #42170 (fixes bug #46331)

From: Jeffrey Stedfast <fejj ximian com>
To: Not Zed <notzed ximian com>
Cc: evolution-patches ximian com
Subject: Re: [evolution-patches] patch to revert the fix for bug #42170 (fixes bug #46331)
Date: Fri, 25 Jul 2003 14:37:33 -0400

ok, new patch (one that fixes both bugs)

now for an explanation on how name strings were being
double-utf-8-encoded:

header_decode_word() converts all rfc822 word strings into UTF-8 before
returning them.

header_decode_mailbox() calls header_decode_word() to grab 'word' by
'word' to construct the 'name' portion of the amil address. Each 'word'
that it gets, it passes to header_decode_string() so that if the word is
an rfc2047 encoded word, it will get decoded and converted to UTF-8
before being appended to the GString *name variable. However,
header_decode_string *also* converts raw 8bit strings into UTF-8, this
is how the name strings were double-utf-8-encoded.

Since we don't need header_decode_word() to convert to UTF-8 for any
place other than in header_decode_mailbox() for that one specific bug
(42170 - which is a bug about how addr-spec strings are not sanitised to
UTF-8, thus allowing spammers to send us invalid 8bit addr-specs that
cause pango to crash), I think a better solution for bug #42170 would be
to wait until we've finished parsing out the addr-spec string and
putting it all together into GString *addr and then checking that it is
valid UTF-8... if it is not, then convert it to UTF-8.

I think this is the cleaner solution...

Jeff

On Fri, 2003-07-25 at 13:22, Jeffrey Stedfast wrote:
> Not Zed wrote:
> 
> >I don't agree with this patch.  The problem is the header is invalid
> >anyway since it's got 8 bit data in it.
> >
> 
> yea, but so was the other header :-)
> 
> >
> >Having pango abort and display nothing on bad text is a bigger problem
> >than a badly displayed, bad text.
> >
> 
> I'll agree that we should not feed non-UTF-8 to etable so that we don't
> get an abort() in pango, but that doesn't mean that we have to sacrifice
> being able to handle raw iso-8859-1 headers (tho I agree they are
> invalid and I normally wouldn't feel bad about it not working, but the
> original patch was wrong...).
> 
> >
> >I guess something is wrong with the other patch, maybe something is
> >re-encoding utf8 twice or something, or his locale default is utf8, in
> >which case it *was* doing the right thing ...  But something needs to be
> >there to address 42170.
> >
> 
> header_decode_word() was converting word tokens into UTF-8 and the code
> that was using header_decode_word() expected word-tokens. This code was
> later converting it to UTF-8 again with the header_decode_string() which
> is where it *should* be converted to UTF-8...
> 
> since lewing's original bug was just that local-part tokens of
> addr-spec's would be left in a non-UTF-8 state and thus break pango... I
> suggest that we just convert the addr->str to UTF-8 when we're done
> constructing it *or* we just replace 8bit chars with '?' in the
> addr-spec string.
> 
> so perhaps something like this:
> 
> if (!g_utf8_validate (addr->str, addr->len)) {
>     unsigned char *ptr;
> 
>     ptr = addr->str;
>     while (*ptr) {
>        if (*ptr >= 128)
>           *ptr = '?';
>        ptr++;
>     }
> }
> 
> 
> the only reason I suggest '?' (or '_' or 'x' or something) rather than
> trying to convert to UTF-8 is that addr-specs don't allow anything but
> us-ascii anyway, so the address is already invalid (100% guarentee that
> it is spam). Now, someone might argue that since hostnames in the future
> will be UTF-8, well... we've already got that covered - the token is
> *not* in UTF-8 and so again it is still invalid.
> 
> if people really want, we could try and convert to utf-8. I don't really
> care much one way or the other...just that I don't feel it is worth it.
> 
> Jeff
> 
> >
> >
> >On Fri, 2003-07-25 at 06:40, Jeffrey Stedfast wrote:
> >  
> >
> >>I think a better approach to bug #42170 might be to check the parsed
> >>addr->str when we're done and do charset conversions there instead? or
> >>maybe just replace those invalid 8bit chars with a '?' or something?
> >>it's 100% invalid... those can't be real email addresses, so no sense
> >>trying to "make it work".
> >>
> >>Jeff
> >>    
> >>
> >
> >  
> >
> 
> 
> 
> 
> _______________________________________________
> Evolution-patches mailing list
> Evolution-patches lists ximian com
> http://lists.ximian.com/mailman/listinfo/evolution-patches
-- 
Jeffrey Stedfast
Evolution Hacker - Ximian, Inc.
fejj ximian com  - www.ximian.com

? 46331.patch
Index: ChangeLog
===================================================================
RCS file: /cvs/gnome/evolution/camel/ChangeLog,v
retrieving revision 1.1836.2.5
diff -u -r1.1836.2.5 ChangeLog
--- ChangeLog	17 Jul 2003 06:22:25 -0000	1.1836.2.5
+++ ChangeLog	25 Jul 2003 18:15:47 -0000
@@ -1,3 +1,14 @@
+2003-07-25  Jeffrey Stedfast  <fejj ximian com>
+
+	* camel-mime-utils.c (header_decode_word): Revert NotZed's fix for
+	bug #42170 - this causes even more problems than it solves. See
+	bug #46331 for info. Basically, each address header would be
+	converted to UTF-8 twice which means no raw 8bit address header
+	would render correctly.
+	(header_decode_mailbox): Perform a sanity check on the resultant
+	addr->str to make sure that it is valid UTF-8, if not convert it
+	to UTF-8. Fixes bug #42170.
+
 2003-07-17  Timo Sirainen <tss iki fi>
 
 	** See bug #42573
Index: camel-mime-utils.c
===================================================================
RCS file: /cvs/gnome/evolution/camel/camel-mime-utils.c,v
retrieving revision 1.183.4.2
diff -u -r1.183.4.2 camel-mime-utils.c
--- camel-mime-utils.c	2 Jul 2003 16:46:33 -0000	1.183.4.2
+++ camel-mime-utils.c	25 Jul 2003 18:15:49 -0000
@@ -1777,36 +1777,18 @@
 }
 
 static char *
-header_decode_word(const char **in, const char *charset)
+header_decode_word(const char **in)
 {
-	char *out;
+	const char *inptr = *in;
 
-	header_decode_lwsp(in);
-	if (**in == '"')
-		out = header_decode_quoted_string(in);
-	else
-		out = header_decode_atom(in);
-
-	/* FIXME: temporary workaround for non-ascii-data problem, see bug #42710 */
-	if (out) {
-		char *p;
-
-		for (p=out;*p;p++) {
-			if ((*p) & 0x80) {
-				GString *newstr = g_string_new("");
-
-				if (charset == NULL || !append_8bit(newstr, out, strlen(out), charset))
-					append_latin1(newstr, out, strlen(out));
-
-				g_free(out);
-				out = newstr->str;
-				g_string_free(newstr, FALSE);
-				break;
-			}
-		}
+	header_decode_lwsp(&inptr);
+	if (*inptr == '"') {
+		*in = inptr;
+		return header_decode_quoted_string(in);
+	} else {
+		*in = inptr;
+		return header_decode_atom(in);
 	}
-
-	return out;
 }
 
 static char *
@@ -2300,7 +2282,7 @@
 	header_decode_lwsp(&inptr);
 
 	/* addr-spec */
-	word = header_decode_word(&inptr, NULL);
+	word = header_decode_word(&inptr);
 	if (word) {
 		addr = g_string_append(addr, word);
 		header_decode_lwsp(&inptr);
@@ -2308,7 +2290,7 @@
 		while (*inptr == '.' && word) {
 			inptr++;
 			addr = g_string_append_c(addr, '.');
-			word = header_decode_word(&inptr, NULL);
+			word = header_decode_word(&inptr);
 			if (word) {
 				addr = g_string_append(addr, word);
 				header_decode_lwsp(&inptr);
@@ -2369,7 +2351,7 @@
 	addr = g_string_new("");
 
 	/* for each address */
-	pre = header_decode_word(&inptr, charset);
+	pre = header_decode_word(&inptr);
 	header_decode_lwsp(&inptr);
 	if (!(*inptr == '.' || *inptr == '@' || *inptr==',' || *inptr=='\0')) {
 		/* ',' and '\0' required incase it is a simple address, no @ domain part (buggy writer) */
@@ -2383,7 +2365,7 @@
 			last = pre;
 			g_free(text);
 
-			pre = header_decode_word(&inptr, charset);
+			pre = header_decode_word(&inptr);
 			if (pre) {
 				size_t l = strlen (last);
 				size_t p = strlen (pre);
@@ -2401,7 +2383,7 @@
 				while (!pre && *inptr && *inptr != '<') {
 					w(g_warning("Working around stupid mailer bug #5: unescaped characters in names"));
 					name = g_string_append_c(name, *inptr++);
-					pre = header_decode_word(&inptr, charset);
+					pre = header_decode_word(&inptr);
 				}
 			}
 			g_free(last);
@@ -2428,7 +2410,7 @@
 					w(g_warning("broken route-address, missing ':': %s", *in));
 				}
 			}
-			pre = header_decode_word(&inptr, charset);
+			pre = header_decode_word(&inptr);
 			header_decode_lwsp(&inptr);
 		} else {
 			w(g_warning("broken address? %s", *in));
@@ -2445,7 +2427,7 @@
 	while (*inptr == '.' && pre) {
 		inptr++;
 		g_free(pre);
-		pre = header_decode_word(&inptr, charset);
+		pre = header_decode_word(&inptr);
 		addr = g_string_append_c(addr, '.');
 		if (pre)
 			addr = g_string_append(addr, pre);
@@ -2547,6 +2529,23 @@
 	*in = inptr;
 	
 	if (addr->len > 0) {
+		if (!g_utf8_validate (addr->str, addr->len, NULL)) {
+			/* workaround for invalid addr-specs containing 8bit chars (see bug #42170 for details) */
+			const char *locale_charset;
+			GString *out;
+			
+			locale_charset = e_iconv_locale_charset ();
+			
+			out = g_string_new ("");
+			
+			if ((charset == NULL || !append_8bit (out, addr->str, addr->len, charset))
+			    && (locale_charset == NULL || !append_8bit (out, addr->str, addr->len, locale_charset)))
+				append_latin1 (out, addr->str, addr->len);
+			
+			g_string_free (addr, TRUE);
+			addr = out;
+		}
+		
 		address = header_address_new_name(name ? name->str : "", addr->str);
 	}
 
@@ -2568,7 +2567,7 @@
 
 	/* pre-scan, trying to work out format, discard results */
 	header_decode_lwsp(&inptr);
-	while ( (pre = header_decode_word(&inptr, charset)) ) {
+	while ( (pre = header_decode_word(&inptr)) ) {
 		group = g_string_append(group, pre);
 		group = g_string_append(group, " ");
 		g_free(pre);
@@ -2683,7 +2682,7 @@
 	}
 	
 	/* Eudora has been known to use <.@> as a content-id */
-	if (!(buf = header_decode_word (&inptr, NULL)) && !strchr (".@", *inptr))
+	if (!(buf = header_decode_word (&inptr)) && !strchr (".@", *inptr))
 		return NULL;
 	
 	addr = g_string_new ("");
@@ -2698,10 +2697,10 @@
 		if (!at) {
 			if (*inptr == '.') {
 				g_string_append_c (addr, *inptr++);
-				buf = header_decode_word (&inptr, NULL);
+				buf = header_decode_word (&inptr);
 			} else if (*inptr == '@') {
 				g_string_append_c (addr, *inptr++);
-				buf = header_decode_word (&inptr, NULL);
+				buf = header_decode_word (&inptr);
 				at = TRUE;
 			}
 		} else if (strchr (".[]", *inptr)) {
@@ -2774,7 +2773,7 @@
 				break;
 			}
 		} else {
-			word = header_decode_word (&inptr, NULL);
+			word = header_decode_word (&inptr);
 			if (word)
 				g_free (word);
 			else if (*inptr != '\0')

Follow-Ups:
- Re: [evolution-patches] patch to revert the fix for bug #42170 (fixes bug #46331)
  - From: Not Zed

References:
- [evolution-patches] patch to revert the fix for bug #42170 (fixes bug #46331)
  - From: Jeffrey Stedfast
- Re: [evolution-patches] patch to revert the fix for bug #42170 (fixes bug #46331)
  - From: Not Zed
- Re: [evolution-patches] patch to revert the fix for bug #42170 (fixes bug #46331)
  - From: Jeffrey Stedfast

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]