[libxml++] Character encoding, UTF-8 and such



Hi.

After reading up on old articles in the mailing list archive, I understand that support for UTF-8 in libxml++ is still to come, and it will not be available after libxml++ 1.0 has been released, am I right?

Anyway; what I wanted to do is to propose a temporary "solution" while we're waiting for support for an UTF-8 aware string class or whatever. The solution would not require any change to the outside API, and thus should be relatively safe to apply. Ok; suggestion:

std::string is really a typedef for std::basic_string<char>. When a std::string is passed on from libxml++ to a function in libxml, it's char * representation is just casted to unsigned char *. Would it be terribly wrong to assume that the input in the std::string is always ISO-8859, and convert the input-string to UTF-8 (as opposed to just casting) before passing it to libxml? libxml contains a function called something like isoLatinToUTF8, which can take care of the conversion.

As a result, we would still have std::string in all the public interfaces, but at least gain "support" for 128 more characters, which I guess would make the life better for most people. Since the first 127 characters in ISO-8859 are the same as in ASCII, it should not make the matter worse for anyone.

Thoughts?

Morten.


******************************* (on mailgw)

email-body was scanned. No virus was found.
*******************************


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]