Re: [libxml++] UTF8 support



Murray Cumming wrote:
On Fri, 2003-02-21 at 17:54, Martijn wrote:

Hi all,

Is there any planning regarding UTF8 support within the library?


At the moment I plan to use glibmm 2.4's Glib::ustring. Stefan Seefeld
has suggested an alternative but I don't know what it is yet.

...and I don't know what I could add to my explanation to help you
understand my suggestion. Was I so unclear ?

Since the sourceforge ML archiver is such a pain, I try again:

What you now have is something like this:

std::string Attribute::get_name() const
{
  return c_obj()->name ? (char*)c_obj()->name : "";
}

i.e. there is a C-style cast from a member of type 'xmlChar *'
to 'char *', which is fed into std::string's constructor.
It is clear that this will cause much pain as soon as you use
more than the ASCII subset of utf8.

Now you seem to suggest the following change:

Glib::ustring Attribute::get_name() const
{
  return create_ustring_from_libxml2(c_obj()->name);
}

where you provide a function that takes an 'xmlChar *'
and generates a Glib::ustring from it.

That fixes the problem with non-ASCII utf8 characters, but
incures an important disadvantage: libxml++ gets tied to a
particular unicode library / string representation. It will
be fine for you (Murray) as you obviously plan to use it
with other GNOME related software, so this isn't a disadvantage
for you.
However, if libxml++ really aims at being as generic as the
library it wraps (libxml2), it should not impose any particular
unicode library. That's why I suggest a slight modification to
your version:

template <typename string, typename convert>
string basic_attribute<string, convert>::get_name() const
{
  return convert::from_libxml2(c_obj()->name);
}

This is almost the *exact* same as you suggest, only that
instead of hardcoding the conversion function, I *parametrize*
the entire libxml++ code with it (and the string type).
That's nowadays a standard technique in C++, you'll find it
for example in the definition of std::iostream and std::string.

Once you know which unicode library you want (say, Glib::ustring),
you provide a converter 'trait':

struct ustring_convert
{
  static Glib::ustring from_libxml2(const xmlChar *);
  static xmlChar *to_libxml2(const Glib::ustring &);
};

and then define:

typedef basic_attribute<Glib::ustring, ustring_convert> Attribute;

and you'v got the *exact* same API as your original Glib::ustring
version, with the obvious advantage of letting people *choose*
whether they want Glib::ustring or another.
I remember reading some criticism of yours of Qt, and in particular
it defining yet another string type (QString). I fully agree. But
here I see you doing the same error. If I already use a unicode
string type (for example Qt's), I don't want to have libxml++ use
yet another type, and then being forced to convert between the two.
Using my suggested string trait technique, I don't need to.

I hope this clarifies my suggestion a bit.

Regards,
		Stefan





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]