Re: [libxml++] UTF8 support



Ole Laursen wrote:

But I really don't see the big problem here. std::string is really
just a fancy way of saying 'char *', right?

No.

And any decent
Unicode-aware string library will have a convenient conversion from
std::string, right?

No.

So if you need to process the individual characters (in my experience
with gtkmm/glibmm, this is seldomly needed) you can simply treat the
input/output from the library as raw data which you feed to your
string library. Why is this a problem?

I don't fully get your point. Are you advocating libxml++ continuing
to use std::string ? That's really a bad idea IMO:

'char *' is, beside being used for strings in C, a data type used for
generic memory, i.e. there are no semantics associated with it (such
as 'null terminated string').

std::string represents *text*, and as such, it has a lot more meaning.
You can iterate over the elements, expecting to get at individual
characters. Just to name an example.

While it may be true that you can (technically) use std::string to
contain utf8 data, the std::string *interface* would be completely
inappropriate (beside the 'data()' and 'length()' methods :-)

Please don't abuse std::string in such a horrible way.

But to go along the line you seem to suggest: libxml++ may use a
'data container' that is agnostic of the encoding or any related
interpretation of the content. That may actually not even be such
a bad idea, since it could just be a smart pointer taking over
the memory from libxml2, freeing the data in its destructor using
xmlFree().
That would make it possible to abstract the unicode library away
as my suggestion, and would replace my suggested compile-time polymorphism by runtime-polymorphism (assuming appropriate
conversion functions doing the 'convert from/to libxml2' work).
It wouldn't incure much performance penalty, as there is no
additional copying involved.

Regards,
		Stefan





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]