Re: [xml] Default Catalog On Windows, Deprecated API



On 15.06.2004 08:52, Peter Jacobi wrote:
Yes, Win32 filenames are character strings, not byte strings (but e.g. 
there is no internal normalization, so you should better give them all in 
in Unicode NFC). So when you are switching default codepages on your 
systems, the "look" (character content) of the filenames stay the same, 
but the byte strings seen by programs using 8-bit interfaces change.
The MSFT way to I18n is to use internally UTF-16, and UTF-8 support
is rather weak (UTF-8 is codepage 65001, BTW).
It has been a time since I tried this for the last time and gave up, but 
back then there wasn't a single locale I could use UTF-8 with. UTF-8 is 
65001, but you cannot use that codepage with with any locale. Means, 
calling, for example, setlocale(LC_ALL, "French_France.65001") will 
return NULL, because the region-codepage combination isn't valid on 
Windows.
Ideally either a) Win32 or b) the compiler's RTL should be able to be told to interpret all 8-bit strings as UTF-8. But
for a) I haven't found a way (there's a GetThreadACP, but no SetThreadACP)

for b) there is a _setmbcp (approximately) in MSVC RTL, but to my best knowledge, it's not consistently affecting all file related functions.
_setmbcp can, at best, use the encoding specified in a previous call to 
setlocale. Neither it nor setlocale affects file operations. They affect 
things like date formats, string collation and their likes.
If anybody knows a way out of this, I would be very thankful to hear about.
This is what I see as The Windows Way: UTF-8 is a coding form for 
Unicode, just like UTF-16 is. If you use UTF-8, then you use Unicode. 
Coding forms UTF-8, UTF-16 and UTF-32 all represent the same Unicode 
standard and relate algorithmically, so the conversion between the three 
is easy and lossless. If the coding form used by Windows (UTF-16) 
doesn't fit the application, the application will have to take the 
conversion responsibility. All the auto-conversion related to locale in 
Windows is seen as support for legacy applications which don't know 
Unicode. But UTF-8 is Unicode and applications supporting it don't 
belong to this legacy group.
The current versions of Windows won't help you by converting the 
internal UTF-16 file names to UTF-8 on the fly. This could change in the 
future, but I wouldn't count on it.
Ciao,
Igor



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]