Re: [libxml++] program crashes when parsing XML with accent
- From: Jonathan Wakely <cow compsoc man ac uk>
- To: libxmlplusplus-general lists sourceforge net
- Subject: Re: [libxml++] program crashes when parsing XML with accent
- Date: Wed, 5 May 2004 09:59:26 +0100
On Tue, May 04, 2004 at 11:13:54PM +0200, Christophe de Vienne wrote:
> Christophe de Vienne wrote:
>
> >Christophe de VIENNE wrote:
> >
> >>Murray Cumming wrote:
> >>
> >>>Rather than getting the number of characters and then giving that to
> >>>the
> >>>ustring constructor to convert back to number of bytes, I suggest we
> >>>put
> >>>the string in a std::string and just use the Glib::ustring(std::string)
> >>>constructor. I can't see a more suitable ustring constructor.
> >>
> >>Indeed (cf my other message).
> >>However I think we should signal this to the glibmm dev team, so they
> >>at least document it since it does not have the behavior we naturaly
> >>though it had.
> >
> >Fixed in the CVS.
> >However before releasing 2.6.1 I want to be 100% sure that the sax
> >callbacks characters and cdata_block gives zero terminated strings.
>
>
> Well, I had a look to libxml2 source and it seems that the character
> callback may be called with a non zero-terminated string.
Sorry I didn't reply to this earlier, before you reached a conclusion.
> The solution is, I think, to instanciate like this :
> Glib::ustring( std::string(ch, len) )
> This should not make unecessary buffer copy with g++ 3.x at least, since
> std::string has a COW implementation, and Glib::ustring constructor
> taking a std::string just copy it.
How about using this ctor taking an iterator range:
template <class InputIterator>
Glib::ustring::ustring(InputIterator begin, InputIterator end);
like so:
Glib::ustring(ch, ch+len);
This doesn't rely on the combination of two implementation details:
a) Glib::ustring uses a std::string internally (although that's probably
not going to change) and the ctor taking a std::string makes a copy
using string_(src) not e.g. string_(src.begin(), src.end())
(i.e. allows COW implementation to be shared)
b) std::string is COW. This is currently true for libstdc++ but might
not be for other standard libraries, and libstdc++ is looking at
allowing you to choose whether std::string is COW or not when you
configure and build the library (because COW can be slow in
multi-threaded apps)
Using the range ctor would avoid either excess locking or copying
(for COW and non-COW std::string respectively) in multi-threaded apps.
jon
--
[Inlines] are the third most-misused C++ feature
(after inheritance and overloading).
- Nathan Myers
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]