Re: [libxml++] UTF8 support



On Sun, 2003-02-23 at 03:31, Stefan Seefeld wrote:
> Murray Cumming wrote:
> > On Fri, 2003-02-21 at 17:54, Martijn wrote:
> > 
> >>Hi all,
> >> 
> >>Is there any planning regarding UTF8 support within the library?
> > 
> > 
> > At the moment I plan to use glibmm 2.4's Glib::ustring. Stefan Seefeld
> > has suggested an alternative but I don't know what it is yet.
> 
> ...and I don't know what I could add to my explanation to help you
> understand my suggestion. Was I so unclear ?

You didn't show us this example code. Thanks for showing us now.

> 
> Since the sourceforge ML archiver is such a pain, I try again:
> 
> What you now have is something like this:
> 
> std::string Attribute::get_name() const
> {
>    return c_obj()->name ? (char*)c_obj()->name : "";
> }
> 
> i.e. there is a C-style cast from a member of type 'xmlChar *'
> to 'char *', which is fed into std::string's constructor.
> It is clear that this will cause much pain as soon as you use
> more than the ASCII subset of utf8.
> 
> Now you seem to suggest the following change:
> 
> Glib::ustring Attribute::get_name() const
> {
>    return create_ustring_from_libxml2(c_obj()->name);
> }
> 
> where you provide a function that takes an 'xmlChar *'
> and generates a Glib::ustring from it.
> 
> That fixes the problem with non-ASCII utf8 characters, but
> incures an important disadvantage: libxml++ gets tied to a
> particular unicode library / string representation. It will
> be fine for you (Murray) as you obviously plan to use it
> with other GNOME related software, so this isn't a disadvantage
> for you.
> However, if libxml++ really aims at being as generic as the
> library it wraps (libxml2), it should not impose any particular
> unicode library. That's why I suggest a slight modification to
> your version:
> 
> template <typename string, typename convert>
> string basic_attribute<string, convert>::get_name() const
> {
>    return convert::from_libxml2(c_obj()->name);
> }
> 
> This is almost the *exact* same as you suggest, only that
> instead of hardcoding the conversion function, I *parametrize*
> the entire libxml++ code with it (and the string type).
> That's nowadays a standard technique in C++, you'll find it
> for example in the definition of std::iostream and std::string.
> 
> Once you know which unicode library you want (say, Glib::ustring),
> you provide a converter 'trait':
> 
> struct ustring_convert
> {
>    static Glib::ustring from_libxml2(const xmlChar *);
>    static xmlChar *to_libxml2(const Glib::ustring &);
> };
> 
> and then define:
> 
> typedef basic_attribute<Glib::ustring, ustring_convert> Attribute;
>
> and you'v got the *exact* same API as your original Glib::ustring
> version,

I understand now.

>  with the obvious advantage of letting people *choose*
> whether they want Glib::ustring or another.

> I remember reading some criticism of yours of Qt, and in particular
> it defining yet another string type (QString). I fully agree. But
> here I see you doing the same error.

No, the major problem with Qt was that it
- defined a string class originally not just for Unicode but because it
just didn't like std::string. This was typical of many
arbitrarily-reimplemented things in Qt.
- Like it's signal/slots C++ extensions, XML parser, etc, it didn't make
them available separately from the GUI toolkit. Glibmm tries a little
harder to be a generically useful C++ library.

>  If I already use a unicode
> string type (for example Qt's), I don't want to have libxml++ use
> yet another type, and then being forced to convert between the two.
> Using my suggested string trait technique, I don't need to.
> 
> I hope this clarifies my suggestion a bit.

Yes, we should consider it. So far I can think of these 2 disadvantages:
- All implementation is via templates in headers - so we can't fix it
just by installing a new version - applications must be recompiled.
- The API would be less clear (But, it works for std::string)

-- 
Murray Cumming
murray usa net
www.murrayc.com





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]