Le Tue, Jan 29, 2002, à 05:20:41PM +0100, Cyrille Chepelov a écrit:
Hence my intent to test how to detect that \xc2\xab doesn't translate into anything in the current locale encoding, and use the ASCII fallback in that case. However, for the locales where \xc2\xab is displayable and if we can reliably detect it is indeed displayable, IMO we should use it rather than ASCII simulacres.
OK, here are the results: - test.c is basically a stripped down, hardcoded-to-latin1 version of charconv.c (it's encoded in utf-8. I hope the test files you sent me weren't swear words <grin/> They looked definitely Japanese in my emacs21.) There are four strings: one latin1 (expected to convert), and three which are not expected to convert into latin1 (for various but obvious reasons). - test.log is the result of the test, with 2>&1. As you can see, unicode_iconv() just bails out (and sets errno) when the string is not convertible. I'm thinking about adding a try_charconv_utf8_to_local8() function (taking all code from charconv_utf8_to_local8() until before the test on the result of unicode_iconv(), and letting it return NULL (but silently !) if the input string can't be converted to local charset. This should allow to detect whether the « and » characters are convertible in the current encoding. Problem: I see there's an alternate implementation of charconv_utf8_to_local8, which basically delegates to glib1.3. Is this function silent when presented with "bad" input ? Or is it safe to assume we're going to either HAVE_ICONV or HAVE_UNICODE even in the glib1.3 case and use code derived from the older implementation of charconv_utf8_to_local8 ? Now people are talking of C++0x, I'll probably write to Mr. Sutter so that the Powers That Be (and Who Talk To The C Comittee) seriously plan of adding #mess, #beware, #horrible and #hell pre-processor directives. -- Cyrille -- Grumpf.
Attachment:
test.c
Description: Text Data
Attachment:
test.log
Description: Text document