[gmime-devel] determining encodings

From: Yuval Peduel <ypeduel yahoo-inc com>
To: "gmime-devel-list gnome org" <gmime-devel-list gnome org>
Subject: [gmime-devel] determining encodings
Date: Wed, 9 Aug 2017 17:56:21 +0000 (UTC)

Most messages with subjects and From: headers using characters outside the ASCII set now use the RFC-2047 encoding to keep the actual bytes in the message "7-bit safe". But there are still a significant number of messages coming in which use national encoding: big5 from China, Taiwan, and Singapore; EUC-JIS and shift-JIS from Japan; cp1255 from Israel; etc.

What is the best way to convert these strings into UTF-8?

Since these contain 8-bit characters, I tried using g_mime_utils_decode_8bit with a NULL encoding, assuming it would determine the best one to use. But in my test, this didn't work at all. (My test consisted of:

- starting with one UTF-8 string for each of 4 encodings, the equivalent of

- "Happy New Year" in Chinese (big5

- "Good Morning" for shift-JIS

- "Good Evening" for EUC-JIS

- "Peace unto you" for cp1255

- I converted the UTF-8 to a byte sequence using the corresponding encoding.

- I then fed the four resulting byte sequences to g_mime_utils_decode_8bit and wrote out the results

I confirmed that the input to g_mime_utils_decode_8bit were correctly encoded by decoding them with the proper decoding.

So:

1. is g_mime_utils_decode_8bit the right tool for the job? I assume it works properly when one actually knows the encoding, but when one doesn't?

2. if so, how should I be using it, because:

output_ptr = g_mime_utils_decode_8bit(NULL, input_ptr, input_length);

isn't doing it.

3. if it isn't, what is the right way?

TIA.

Follow-Ups:
- Re: [gmime-devel] determining encodings
  - From: Jeffrey Stedfast

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]