Re: [xml] libxml2 is dividing contiguous non-whitespace characters



On Tue, Mar 15, 2011 at 08:35:29PM -0600, Dan McRae wrote:
Forgive me if this is a rookie user error, but I ran across an odd
situation with respect to the libxml2 parser. It appears that when
it is parsing a series of numerical values separated by whitespace
(e.g. "... 293.18 218.92 289.13 ..."), it is possible for it to grab
just a portion of the series resulting in a number being separated
from the rest of its digits.

  I not sure what you mean, libxml2 doesn't look in the content
except for the specific case where type validation is done using
XSD or RNG.

[...]
It appears to be completely random as to whether the split happens
within a value or not, except that the array ends with character
number 250 (suspiciously close to 256). The more white space
characters, the less likely it is to happen within a value, so I can
reformat the numbers and make the error go away. However, this
doesn't leave me with a very comfortable feeling. Plus, I have a
hard time believing that we wouldn't have seen this a long time ago
if it was truly this random.

Is this something with which you are familiar? I searched the web
but didn't see any returns that seems to address this situation. I'm
wondering if there's a way to instruct libxml2 to not separate
contiguous non-whitespace characters. I hope I don't have to try to
reassemble separated values myself.

  What are you doing exactly ? If you're parsing with SAX the caracters
coming from a single node may come in multiple callbacks, the API will
never garantee you get everything in a single chunk, you have to
reassemble. That's the principle of the streaming API, you could have a
single text node of a terabyte, and libvirt SAX will parse it using
constant memory but you will have to do data-analysis on the fly.
If you find SAX too hard use the Reader, I discourage use of SAX except
for very specific kind of processing.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]