[Evolution] Re: [mogutan din or jp: A suggestion about decoding charsets]

From: NotZed <notzed helixcode com>
To: pan superpimp org
Cc: notzed helixcode com (Michael Zucchi), fejj helixcode com (Jeffrey Stedfast), evolution helixcode com, mogutan din or jp (Yamahata Kenichiro), pan superpimp org
Subject: [Evolution] Re: [mogutan din or jp: A suggestion about decoding charsets]
Date: Mon, 18 Sep 2000 17:23:02 -0400 (EDT)

Hi,

Thats not the right place to put it.

Camel uses utf-8 internally, and its upto the application
to map that to whatever character set it has available
for display and whatnot (note that iconv internally uses
unicode to do any such translation anyway).  So infact 
evolution has code to convert that utf8 into displayable
form (and gtk 2 will have code to do similar).

The header encoding needs to be smarter about
what character sets it encodes into; it just needs a mapping
function of what charsets include what subsets of unicode
and then a way to choose the lowest common denominator for
a given word.

 Michael


Hi all, 

I'm using camel's camel-mime-utils.c in Pan and got this mail from a Pan
user today regarding i18n improvements.

cheers,
Charles

----- Forwarded message from Yamahata Kenichiro <mogutan din or jp> -----

Delivered-To: charles code rebelbase com
Delivered-To: superpimp org-pan superpimp org
Date: Mon, 18 Sep 2000 01:49:31 +0900
From: Yamahata Kenichiro <mogutan din or jp>
To: pan superpimp org
Subject: A suggestion about decoding charsets
X-Mailer: Sylpheed version 0.3.28 (GTK+ 1.2.8; Linux 2.2.16; i686)

Hi, there's a problem with i18n of PAN, for people like me who uses
charsets used in Asia.

Because most of the charsets used in Asia are incompatible with UTF-8
(character maps are completely different), we need to iconv strings to
a charset specified by enviroment variable "LANG".

For example, LANG variable in my environment is like this:

ja_JP.eucJP

A string after a period describes the default charset. So my default
charset is "eucJP".


I have modified the code like below and it worked well.

-----------------------------------------------------------------------------

*** camel-mime-utils.c.orig   Sun Sep 17 23:07:19 2000
--- camel-mime-utils.c        Mon Sep 18 00:33:45 2000
***************
*** 820,825 ****
--- 820,849 ----
      *in = inptr;
  }
  
+ static char *
+ get_current_charset()
+ {
+     gchar *locale_str;
+     gchar **split_str;
+     gchar *ret_str;
+ 
+     locale_str = gtk_set_locale();
+     if (!locale_str) {
+             return g_strdup("UTF-8");
+     } 
+     split_str = g_strsplit(locale_str, ".", 2);
+     
+     if (*(split_str + 1)) {
+             ret_str = g_strdup(*(split_str + 1));
+     }
+     else {
+             ret_str = g_strdup("UTF-8");
+     }
+     g_strfreev(split_str);
+     return ret_str;
+ }
+ 
+ 
  /* decode rfc 2047 encoded string segment */
  static char *
  rfc2047_decode_word(const char *in, int len)
***************
*** 867,872 ****
--- 891,897 ----
              }
              d(printf("The encoded length = %d\n", inlen));
              if (inlen>0) {
+                     char *charset;
                      /* yuck, all this snot is to setup iconv! */
                      tmplen = inptr-in-3;
                      encname = alloca(tmplen+1);
***************
*** 879,886 ****
                      outbase = alloca(outlen);
                      outbuf = outbase;
  
                      /* TODO: Should this cache iconv converters? */
!                     ic = unicode_iconv_open("UTF-8", encname);
                      if (ic != (unicode_iconv_t)-1) {
                              ret = unicode_iconv(ic, (const char **)&inbuf, &inlen, &outbuf, &outlen);
                              unicode_iconv_close(ic);
--- 904,914 ----
                      outbase = alloca(outlen);
                      outbuf = outbase;
  
+                     charset = get_current_charset();
+ 
                      /* TODO: Should this cache iconv converters? */
!                     ic = unicode_iconv_open(charset, encname);
!                     g_free(charset);
                      if (ic != (unicode_iconv_t)-1) {
                              ret = unicode_iconv(ic, (const char **)&inbuf, &inlen, &outbuf, &outlen);
                              unicode_iconv_close(ic);

-----------------------------------------------------------------------------

Also, we need to decode a message body from a charset specified in Content-Type
header to the default.

----
Yamahata Kenichiro  mogutan din or jp

----- End forwarded message -----

References:
- [Evolution] [mogutan din or jp: A suggestion about decoding charsets]
  - From: Charles Kerr

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]