Re: --enable-unicode



On Mon, 28 May 2001, Cyrille Chepelov wrote:

Normally, XML files already embed character set encoding information in the
very first element (<?xml version="1.0" encoding="foo"?>

That sounds like the best way to discriminate between old and new files.


Is (libxml1 8-bit only) and (libxml2 dependent on gtk2) ? If not, we

libxml1 doesn't care about character encodings, so 8-bit characters pass
through without problems (you may run into trouble with some multibyte
charsets though).

Libxml2 doesn't rely on gtk2, but cares about encodings.  We already have
conditional support for using libxml2, but it breaks on 8-bit chars.  The
reason is that it assumes that the internal encoding used by the app is
UTF-8, so occasionally mangles the second highest bit of some characters.

probably could move dia's internals to utf8, keeping a charconv_utf8_to_local8 call
just before render_gdk.c/draw_string()  (assuming #54628 is in fact fixed or
someone understands how to really fix it).

That sounds about right.  We should be able to keep charset conversion
just in the file load/save code and the gdk renderer (the conversion code
in the gdk part will go when we switch to gtk 2.0, and should be a no op
on windows already).


However, this will be no small task (basically requires to audit the whole
code for (gchar *) arithmetic and moving that to the unicode_* functions,
and define wrappers for these when !HAVE_UNICODE). I'm very motivated to
tackle this, but I'd like 0.88.1 to not be the new 0.86. I think there has
been enough problems removed in the CVS head relative to 0.88.1, that making
a new release (either 0.88.3 or 0.89) before going utf8 actually makes sense.

If you want a new release, we can do one whenever you want.  Probably
better to call it 0.89 rather than 0.88.3.

If we are going to have unicode as the default, I am inclined to make it a
required library.  The less conditionals, the easier it is to test that a
tarball will build correctly.  What do others think about this?


Getting the locale's charset doesn't look that trivial.  There is a
function that does this in HEAD glib (g_utf8_get_charset_internal in
gutf8.c).  Once we have code to get the charset, it is just a matter of
adding the appropriate iconv calls in lib/dia_xml.c

Well, you've seen that in fact, it's not that difficult <grin/>. Just use
the interface from lib/charconv.h, and let that code worry about how it's
going to be done (I'd really like charconv.c to use the native calls under
Win32, but I don't remember their names. OTOH, if win32-gtk effectively has
HAVE_UNICODE defined, it's not a problem. Hans ?)

We may as well use the libunicode calls unconditionally.  That way, it
will be a simple sed job to convert over to the glib unicode calls found
in glib-2.0 (which will be in a required library for gtk2, so we may as
well use it :)

James.

-- 
Email: james daa com au
WWW:   http://www.daa.com.au/~james/






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]