Re: [libxml++] UTF8 support
- From: Stefan Seefeld <seefeld sympatico ca>
- To: libxmlplusplus-general lists sourceforge net
- Subject: Re: [libxml++] UTF8 support
- Date: Sat, 22 Feb 2003 21:31:28 -0500
Murray Cumming wrote:
On Fri, 2003-02-21 at 17:54, Martijn wrote:
Hi all,
Is there any planning regarding UTF8 support within the library?
At the moment I plan to use glibmm 2.4's Glib::ustring. Stefan Seefeld
has suggested an alternative but I don't know what it is yet.
...and I don't know what I could add to my explanation to help you
understand my suggestion. Was I so unclear ?
Since the sourceforge ML archiver is such a pain, I try again:
What you now have is something like this:
std::string Attribute::get_name() const
{
return c_obj()->name ? (char*)c_obj()->name : "";
}
i.e. there is a C-style cast from a member of type 'xmlChar *'
to 'char *', which is fed into std::string's constructor.
It is clear that this will cause much pain as soon as you use
more than the ASCII subset of utf8.
Now you seem to suggest the following change:
Glib::ustring Attribute::get_name() const
{
return create_ustring_from_libxml2(c_obj()->name);
}
where you provide a function that takes an 'xmlChar *'
and generates a Glib::ustring from it.
That fixes the problem with non-ASCII utf8 characters, but
incures an important disadvantage: libxml++ gets tied to a
particular unicode library / string representation. It will
be fine for you (Murray) as you obviously plan to use it
with other GNOME related software, so this isn't a disadvantage
for you.
However, if libxml++ really aims at being as generic as the
library it wraps (libxml2), it should not impose any particular
unicode library. That's why I suggest a slight modification to
your version:
template <typename string, typename convert>
string basic_attribute<string, convert>::get_name() const
{
return convert::from_libxml2(c_obj()->name);
}
This is almost the *exact* same as you suggest, only that
instead of hardcoding the conversion function, I *parametrize*
the entire libxml++ code with it (and the string type).
That's nowadays a standard technique in C++, you'll find it
for example in the definition of std::iostream and std::string.
Once you know which unicode library you want (say, Glib::ustring),
you provide a converter 'trait':
struct ustring_convert
{
static Glib::ustring from_libxml2(const xmlChar *);
static xmlChar *to_libxml2(const Glib::ustring &);
};
and then define:
typedef basic_attribute<Glib::ustring, ustring_convert> Attribute;
and you'v got the *exact* same API as your original Glib::ustring
version, with the obvious advantage of letting people *choose*
whether they want Glib::ustring or another.
I remember reading some criticism of yours of Qt, and in particular
it defining yet another string type (QString). I fully agree. But
here I see you doing the same error. If I already use a unicode
string type (for example Qt's), I don't want to have libxml++ use
yet another type, and then being forced to convert between the two.
Using my suggested string trait technique, I don't need to.
I hope this clarifies my suggestion a bit.
Regards,
Stefan
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]