Re: [xml] Default Catalog On Windows, Deprecated API
- From: Igor Zlatkovic <igor zlatkovic com>
- To: Peter Jacobi <pj walter-graphtek com>
- Cc: xml gnome org
- Subject: Re: [xml] Default Catalog On Windows, Deprecated API
- Date: Tue, 15 Jun 2004 13:22:25 +0200
On 15.06.2004 08:52, Peter Jacobi wrote:
Yes, Win32 filenames are character strings, not byte strings (but e.g.
there is no internal normalization, so you should better give them all in
in Unicode NFC). So when you are switching default codepages on your
systems, the "look" (character content) of the filenames stay the same,
but the byte strings seen by programs using 8-bit interfaces change.
The MSFT way to I18n is to use internally UTF-16, and UTF-8 support
is rather weak (UTF-8 is codepage 65001, BTW).
It has been a time since I tried this for the last time and gave up, but
back then there wasn't a single locale I could use UTF-8 with. UTF-8 is
65001, but you cannot use that codepage with with any locale. Means,
calling, for example, setlocale(LC_ALL, "French_France.65001") will
return NULL, because the region-codepage combination isn't valid on
Windows.
Ideally either a) Win32 or b) the compiler's RTL should be able to
be told to interpret all 8-bit strings as UTF-8. But
for a) I haven't found a way (there's a GetThreadACP, but no SetThreadACP)
for b) there is a _setmbcp (approximately) in MSVC RTL, but to my best
knowledge, it's not consistently affecting all file related functions.
_setmbcp can, at best, use the encoding specified in a previous call to
setlocale. Neither it nor setlocale affects file operations. They affect
things like date formats, string collation and their likes.
If anybody knows a way out of this, I would be very thankful to hear about.
This is what I see as The Windows Way: UTF-8 is a coding form for
Unicode, just like UTF-16 is. If you use UTF-8, then you use Unicode.
Coding forms UTF-8, UTF-16 and UTF-32 all represent the same Unicode
standard and relate algorithmically, so the conversion between the three
is easy and lossless. If the coding form used by Windows (UTF-16)
doesn't fit the application, the application will have to take the
conversion responsibility. All the auto-conversion related to locale in
Windows is seen as support for legacy applications which don't know
Unicode. But UTF-8 is Unicode and applications supporting it don't
belong to this legacy group.
The current versions of Windows won't help you by converting the
internal UTF-16 file names to UTF-8 on the fly. This could change in the
future, but I wouldn't count on it.
Ciao,
Igor
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]