Re: [xml] Characters callback



On Tue, Jun 14, 2011 at 05:50:12PM -0700, Dlpnet wrote:
[...]
any tools. However when I try to retrieve it with the library the
returned string is split in the libxml2 characters callback. The split
is exactly on the first character with the accent and two calls to the
characters callback are made. And the 3rd party library doesn t handle

  yes that's normal.

this correctly, believing it s two different values.
If there is no UTF-8 characters (or special characters) everything is
fine and the callback is called only once for all the values.

I can "patch" the library now I know what is the problem by setting a
boolean to just let the parsing knows it s the same value but before
doing so I just would like to know if it s normal and if there is a
better approach than what I m planning to do.

  SAX doesn't garantee that for one piece of Text node in an XML
docuemnt you will have only one text() callback. Actually that's
just normal, SAX is designed to work on constant memory, and there
is no limit in the size of a text node in XML. libxml will usually
break large pices into successive 4KB callbacks (IIRC) and will
also use this to minimze memory use and copy when dealing with
entities as you found out.

  The 3rd party library is broken, and need to be fixed for this.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]