detecting non-convertibility of characters

From: Cyrille Chepelov <cyrille chepelov org>
To: dia-list gnome org
Subject: detecting non-convertibility of characters
Date: Tue, 29 Jan 2002 21:41:08 +0100

Le Tue, Jan 29, 2002, à 05:20:41PM +0100, Cyrille Chepelov a écrit:

Hence my intent to test how to detect that \xc2\xab doesn't translate into
anything in the current locale encoding, and use the ASCII fallback in that
case. However, for the locales where \xc2\xab is displayable and if we can
reliably detect it is indeed displayable, IMO we should use it rather than
ASCII simulacres.


OK, here are the results:
        - test.c is basically a stripped down, hardcoded-to-latin1 version
of charconv.c (it's encoded in utf-8. I hope the test files you sent me
weren't swear words <grin/> They looked definitely Japanese in my emacs21.)
There are four strings: one latin1 (expected to convert), and three which
are not expected to convert into latin1 (for various but obvious reasons).
        - test.log is the result of the test, with 2>&1.

        As you can see, unicode_iconv() just bails out (and sets errno) when
the string is not convertible.

I'm thinking about adding a try_charconv_utf8_to_local8() function (taking
all code from charconv_utf8_to_local8() until before the test on the result
of unicode_iconv(), and letting it return NULL (but silently !) if the input
string can't be converted to local charset. This should allow to detect
whether the « and » characters are convertible in the current encoding.

Problem: I see there's an alternate implementation of
charconv_utf8_to_local8, which basically delegates to glib1.3. Is this
function silent when presented with "bad" input ? Or is it safe to assume
we're going to either HAVE_ICONV or HAVE_UNICODE even in the glib1.3 case
and use code derived from the older implementation of charconv_utf8_to_local8 ?

Now people are talking of C++0x, I'll probably write to Mr. Sutter so that
the Powers That Be (and Who Talk To The C Comittee) seriously plan of adding
#mess, #beware, #horrible and #hell pre-processor directives.

        -- Cyrille

-- 
Grumpf.

Attachment: test.c
Description: Text Data

Attachment: test.log
Description: Text document

Follow-Ups:
- Re: detecting non-convertibility of characters
  - From: Akira TAGOH

References:
- Re: Dia ChangeLog report for Tue Jan 29 08:23:01 2002 (UTC)
  - From: Cyrille Chepelov
- Re: Dia ChangeLog report for Tue Jan 29 08:23:01 2002 (UTC)
  - From: Akira TAGOH
- Re: Dia ChangeLog report for Tue Jan 29 08:23:01 2002 (UTC)
  - From: Cyrille Chepelov
- Re: Dia ChangeLog report for Tue Jan 29 08:23:01 2002 (UTC)
  - From: Akira TAGOH
- Re: Dia ChangeLog report for Tue Jan 29 08:23:01 2002 (UTC)
  - From: Cyrille Chepelov

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]