Re: [libxml++] program crashes when parsing XML with accent



On Tue, 2004-05-04 at 17:05 +0200, Frederik Himpe wrote:
> On Tuesday 04 May 2004 18:25, Murray Cumming wrote:
> > On Tue, 2004-05-04 at 14:13, Frederik Himpe wrote:
> > > I have an XML document in ISO-8859-1 character set. When using
> > > libxml++-2.6, the sax pasers crashes when it encounters a character with
> > > an accent (é) in the on_characters method. I'm using the Glib::ustring
> > > class.
> >
> > Glib::ustring should contain a UTF8 string. It can not recognise your
> > encoding and convert by itself. I think you need to use some conversion
> > API such as iconv or the glibmm convert functions.
> 
> Should not libxml++ do this conversion? I'm only receving a Glib::ustring in 
> the on_characters function of the sax parser, so the problem happens already 
> somewhere in libxml++ itself, which seems to be confirmed by the stack trace 
> if I understand it correctly.

OK, I thought that you were parsing from a string rather than a file.
I'm not sure whether libxml can handle files in any encoding. I don't
how it could get as far as the encoding declaration without knowing what
the encoding is. But Daniel will be the expert on that.

It sounds like it shouldn't crash. So, a simple-as-possible test case in
our bugzilla might help us to fix it.

> I now tried saving the document in UTF-8 with gedit. The é character is now 
> translated in hexadecimal c3 a9, which seems to be correct. vim shows again 
> the two strange characters, but that's probably because my locale is not set 
> to UTF-8. Processing this file with libxml++-2.6, still crashes the 
> application with the same back trace.
> 
> Another thing I do not understand: libxml++-1.0 uses std::string, but it seems 
> these std::strings contained UTF-8. Is that possible?

Yes, UTF8 is like that. This might be informative:
http://www.gtkmm.org/docs/glibmm-2.4/docs/reference/html/classGlib_1_1ustring.html#_details






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]