Re: charset code conversion in I/O

Hidetoshi Tajima <hidetoshi tajima eng sun com> writes:

> Hello again,
> By switching GTK+ over to UTF-8, I wonder how legacy, non-unicode character
> set encodings will be taken care of.
> Will GNOME or GLIB's I/O API-s perform code conversion from/to UTF-8
> just as do Java's Reader&Writer or Mozilla's chardet modules?

In general, Glib and GNOME don't have any very heavy wrappers
around file IO. Glib has the g_io_channel functions, but they
are currently more of a wrapper around low-level unbuffered
stream IO than something like stdio.

So far in glib we have g_convert(), which is a nice friendly
wrapper around iconv(), and we have special-cased converters
between the various Unicode encodings

I think we'll be adding some more conversion utility functions:
 - locale <=> UTF-8  [ Pretty simple, Robert Brady has a patch ]
 - read a file a line at a time, converting on the fly
 - read a whole file into a string, converting on the fly

I'm not sure about adding conversion into g_io_channel - I think
we'd have to change the interfaces to do better error reporting
for one thing, and it would be a fairly major job to get right - 
probably too much for Glib-2.0.

The libxml2 library, which GNOME-2.0 will use for XML file
reading, does handle character set conversion.
> What is the backend library implementation? libiconv, or can it be
> pluggable?

The backend implementation for character conversion is iconv -
the idea is that it either will be the native iconv 
(on modern OS's), or libiconv.

Putting something like the Mozilla converters or ICU behind
it would probably be best done by presenting them as an
iconv implementation. But I don't consider that all that
interesting an exercise.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]