Re: [xml] HTML parser and NULL bytes

On Wed, Aug 06, 2008 at 09:42:42AM +1000, Michael Day wrote:
Hi Ashwin,

 I am not sure if I understand the scenario correctly, but in case you 
are trying to give a NULL byte as text content in xml, then i don't 
think any XML compliant parser will parse it, according to the XML draft 
this is invalid.

That is correct, and libxml2 correctly handles this case by printing an 
error message and terminating the parse.

However, libxml2 also has a HTML parser, and the HTML5 spec says that 
NULL bytes do not terminate the document (they are invalid, but you 
should just replace them with U+FFFD and keep going). Also, the libxml2 
HTML parser does not even print an error in this case, it just stops.

I will look at 0 handling, I promise. But i want to point out that other
people expressed interest in HTML5 implementation for libxml2. I think 
this would make perfect sense to try to add the parsing behaviour to 
libxml2 HTML parser, maybe make it the default, and if needed keep the
current behaviour as an HTML parsing option.


Red Hat Virtualization group
Daniel Veillard      | virtualization library
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]