Re: [xml] Default Catalog On Windows, Deprecated API

On 15.06.2004 08:52, Peter Jacobi wrote:

Yes, Win32 filenames are character strings, not byte strings (but e.g. there is no internal normalization, so you should better give them all in in Unicode NFC). So when you are switching default codepages on your systems, the "look" (character content) of the filenames stay the same, but the byte strings seen by programs using 8-bit interfaces change.

The MSFT way to I18n is to use internally UTF-16, and UTF-8 support
is rather weak (UTF-8 is codepage 65001, BTW).

It has been a time since I tried this for the last time and gave up, but back then there wasn't a single locale I could use UTF-8 with. UTF-8 is 65001, but you cannot use that codepage with with any locale. Means, calling, for example, setlocale(LC_ALL, "French_France.65001") will return NULL, because the region-codepage combination isn't valid on Windows.

Ideally either a) Win32 or b) the compiler's RTL should be able to be told to interpret all 8-bit strings as UTF-8. But

for a) I haven't found a way (there's a GetThreadACP, but no SetThreadACP)

for b) there is a _setmbcp (approximately) in MSVC RTL, but to my best knowledge, it's not consistently affecting all file related functions.

_setmbcp can, at best, use the encoding specified in a previous call to setlocale. Neither it nor setlocale affects file operations. They affect things like date formats, string collation and their likes.

If anybody knows a way out of this, I would be very thankful to hear about.

This is what I see as The Windows Way: UTF-8 is a coding form for Unicode, just like UTF-16 is. If you use UTF-8, then you use Unicode. Coding forms UTF-8, UTF-16 and UTF-32 all represent the same Unicode standard and relate algorithmically, so the conversion between the three is easy and lossless. If the coding form used by Windows (UTF-16) doesn't fit the application, the application will have to take the conversion responsibility. All the auto-conversion related to locale in Windows is seen as support for legacy applications which don't know Unicode. But UTF-8 is Unicode and applications supporting it don't belong to this legacy group.

The current versions of Windows won't help you by converting the internal UTF-16 file names to UTF-8 on the fly. This could change in the future, but I wouldn't count on it.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]