Hi Yuval, It is the correct method to use, however, you need to specify a list of charsets that it should even attempt to try. What you need to do is: static const char **charsets = { “big5”, “shift-jis”, “euc-jis”, “cp1255”, NULL }; options = g_mime_parser_options_clone (NULL); g_mime_parser_options_set_fallback_charsets (options, charsets); Then pass those options into decode_8bit(). Hope that helps, Jeff From: gmime-devel-list <gmime-devel-list-bounces gnome org> on behalf of Yuval Peduel via gmime-devel-list <gmime-devel-list gnome org> Most messages with subjects and From: headers using characters outside the ASCII set now use the RFC-2047 encoding to keep the actual
bytes in the message "7-bit safe". But there are still a significant number of messages coming in which use national encoding: big5 from China, Taiwan, and Singapore; EUC-JIS and shift-JIS from Japan; cp1255 from Israel; etc. What is the best way to convert these strings into UTF-8? Since these contain 8-bit characters, I tried using g_mime_utils_decode_8bit with a NULL encoding, assuming it would determine the
best one to use. But in my test, this didn't work at all. (My test consisted of: - starting with one UTF-8 string for each of 4 encodings, the equivalent of - "Happy New Year" in Chinese (big5 - "Good Morning" for shift-JIS - "Good Evening" for EUC-JIS - "Peace unto you" for cp1255 - I converted the UTF-8 to a byte sequence using the corresponding encoding. - I then fed the four resulting byte sequences to g_mime_utils_decode_8bit and wrote out the results I confirmed that the input to g_mime_utils_decode_8bit were correctly encoded by decoding them with the proper decoding. So: 1. is g_mime_utils_decode_8bit the right tool for the job? I assume it works properly when one actually knows the encoding, but
when one doesn't? 2. if so, how should I be using it, because: output_ptr = g_mime_utils_decode_8bit(NULL, input_ptr, input_length); isn't doing it. 3. if it isn't, what is the right way? TIA. |