Re: [xml] HTML parser and NULL bytes
- From: Daniel Veillard <veillard redhat com>
- To: Michael Day <mikeday yeslogic com>
- Cc: olivier Thereaux <ot w3 org>, xml gnome org, "Michael\(tm\) Smith" <mike w3 org>
- Subject: Re: [xml] HTML parser and NULL bytes
- Date: Thu, 7 Aug 2008 04:26:07 -0400
On Wed, Aug 06, 2008 at 09:42:42AM +1000, Michael Day wrote:
Hi Ashwin,
I am not sure if I understand the scenario correctly, but in case you
are trying to give a NULL byte as text content in xml, then i don't
think any XML compliant parser will parse it, according to the XML draft
this is invalid.
That is correct, and libxml2 correctly handles this case by printing an
error message and terminating the parse.
However, libxml2 also has a HTML parser, and the HTML5 spec says that
NULL bytes do not terminate the document (they are invalid, but you
should just replace them with U+FFFD and keep going). Also, the libxml2
HTML parser does not even print an error in this case, it just stops.
I will look at 0 handling, I promise. But i want to point out that other
people expressed interest in HTML5 implementation for libxml2. I think
this would make perfect sense to try to add the parsing behaviour to
libxml2 HTML parser, maybe make it the default, and if needed keep the
current behaviour as an HTML parsing option.
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]