Re: [xml] characters callback called twice (and UTF-8?)



On Fri, May 16, 2008 at 12:37:12PM +0200, bagnacauda wrote:
I need your help to understand what follows.

I have this xml file (you can find it attached) whose tag may contain
western European, Russian or Greek characters, even mixed among them.
I have run xmllint --debug ?Csax on the file to see if everything is OK when
I get a mixed character string and I was surprised to see that the
characters callback is invoked twice: once for the first four characters
(which are western european) and once for the remaining part of the string

  Wrong expectation, libxml2 behavious is normal.
  SAX being a streaming interface, and since a text node in XML 
has no size boundary, this imply that the content of a text node
may be received as multiple characters callbacks, if you don't accept
this in the receiving side you will loose data.
  Once it's clear that you must support multiple consecutive characters
callbacks, well libxml2 uses this to speed up parsing when possible.
  That simple.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]