Re: [xml] UTF-8 decoding bug in HTML parser
- From: Daniel Veillard <veillard redhat com>
- To: Michael Day <mikeday yeslogic com>
- Cc: xml gnome org
- Subject: Re: [xml] UTF-8 decoding bug in HTML parser
- Date: Fri, 26 Sep 2008 12:50:49 +0200
On Fri, Sep 26, 2008 at 08:29:44PM +1000, Michael Day wrote:
Hi Daniel,
Reusing the XML code for this seems to work fine for em and the
regression test, but you have probably a more extensive HTML test
suite than me ;-) so raise the problem if there is a regression !
Actually, I just remembered one more issue: null bytes in HTML documents
terminate the parser, with no error or warning messages. See the
attached test document, which has two paragraphs separated by a null.
that's gonna be harder to handle, the zero is used in places to
indicate the end of the input buffer... I don't expect something trivial
there.
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
daniel veillard com | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library http://libvirt.org/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]