[libxml++] Character encoding, UTF-8 and such
- From: "Morten Hanssen" <morten hanssen pd politiet no>
- To: <libxmlplusplus-general lists sourceforge net>
- Subject: [libxml++] Character encoding, UTF-8 and such
- Date: Thu, 12 Jun 2003 12:56:29 +0200
Hi.
After reading up on old articles in the mailing list archive, I understand that support for UTF-8 in libxml++ is still to come, and it will not be available after libxml++ 1.0 has been released, am I right?
Anyway; what I wanted to do is to propose a temporary "solution" while we're waiting for support for an UTF-8 aware string class or whatever. The solution would not require any change to the outside API, and thus should be relatively safe to apply. Ok; suggestion:
std::string is really a typedef for std::basic_string<char>. When a std::string is passed on from libxml++ to a function in libxml, it's char * representation is just casted to unsigned char *. Would it be terribly wrong to assume that the input in the std::string is always ISO-8859, and convert the input-string to UTF-8 (as opposed to just casting) before passing it to libxml? libxml contains a function called something like isoLatinToUTF8, which can take care of the conversion.
As a result, we would still have std::string in all the public interfaces, but at least gain "support" for 128 more characters, which I guess would make the life better for most people. Since the first 127 characters in ISO-8859 are the same as in ASCII, it should not make the matter worse for anyone.
Thoughts?
Morten.
******************************* (on mailgw)
email-body was scanned. No virus was found.
*******************************
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]