Re: [xml] UTF-8 decoding bug in HTML parser

From: Daniel Veillard <veillard redhat com>
To: Michael Day <mikeday yeslogic com>
Cc: xml gnome org
Subject: Re: [xml] UTF-8 decoding bug in HTML parser
Date: Fri, 26 Sep 2008 12:50:49 +0200

On Fri, Sep 26, 2008 at 08:29:44PM +1000, Michael Day wrote:

Hi Daniel,

  Reusing the XML code for this seems to work fine for em and the
regression test, but you have probably a more extensive HTML test
suite than me ;-) so raise the problem if there is a regression !


Actually, I just remembered one more issue: null bytes in HTML documents  
terminate the parser, with no error or warning messages. See the  
attached test document, which has two paragraphs separated by a null.


  that's gonna be harder to handle, the zero is used in places to
indicate the end of the input buffer... I don't expect something trivial
there.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

References:
- [xml] UTF-8 decoding bug in HTML parser
  - From: Michael Day
- Re: [xml] UTF-8 decoding bug in HTML parser
  - From: Daniel Veillard
- Re: [xml] UTF-8 decoding bug in HTML parser
  - From: Michael Day
- Re: [xml] UTF-8 decoding bug in HTML parser
  - From: Daniel Veillard
- Re: [xml] UTF-8 decoding bug in HTML parser
  - From: Michael Day

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]