Re: [libxml++] program crashes when parsing XML with accent



On Tue, May 04, 2004 at 11:13:54PM +0200, Christophe de Vienne wrote:

> Christophe de Vienne wrote:
> 
> >Christophe de VIENNE wrote:
> >
> >>Murray Cumming wrote:
> >>
> >>>Rather than getting the number of characters and then giving that to 
> >>>the
> >>>ustring constructor to convert back to number of bytes, I suggest we 
> >>>put
> >>>the string in a std::string and just use the Glib::ustring(std::string)
> >>>constructor. I can't see a more suitable ustring constructor.
> >>
> >>Indeed (cf my other message).
> >>However I think we should signal this to the glibmm dev team, so they 
> >>at least document it since it does not have the behavior we naturaly 
> >>though it had.
> >
> >Fixed in the CVS.
> >However before releasing 2.6.1 I want to be 100% sure that the sax 
> >callbacks characters and cdata_block gives zero terminated strings.
> 
> 
> Well, I had a look to libxml2 source and it seems that the character 
> callback may be called with a non zero-terminated string.

Sorry I didn't reply to this earlier, before you reached a conclusion.

> The solution is, I think, to instanciate like this :
> Glib::ustring( std::string(ch, len) )
> This should not make unecessary buffer copy with g++ 3.x at least, since 
> std::string has a COW implementation, and Glib::ustring constructor 
> taking a std::string just copy it.

How about using this ctor taking an iterator range:

    template <class InputIterator>
        Glib::ustring::ustring(InputIterator begin, InputIterator end);

like so:

    Glib::ustring(ch, ch+len);

This doesn't rely on the combination of two implementation details:

a) Glib::ustring uses a std::string internally (although that's probably
   not going to change) and the ctor taking a std::string makes a copy
   using string_(src) not e.g. string_(src.begin(), src.end())
   (i.e. allows COW implementation to be shared)

b) std::string is COW. This is currently true for libstdc++ but might
   not be for other standard libraries, and libstdc++ is looking at
   allowing you to choose whether std::string is COW or not when you
   configure and build the library (because COW can be slow in
   multi-threaded apps)

Using the range ctor would avoid either excess locking or copying
(for COW and non-COW std::string respectively) in multi-threaded apps.

jon


-- 
[Inlines] are the third most-misused C++ feature
(after inheritance and overloading).
	- Nathan Myers




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]