[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [xml] UTF-8 decoding bug in HTML parser



On Fri, Sep 26, 2008 at 08:29:44PM +1000, Michael Day wrote:
> Hi Daniel,
>
>>   Reusing the XML code for this seems to work fine for em and the
>> regression test, but you have probably a more extensive HTML test
>> suite than me ;-) so raise the problem if there is a regression !
>
> Actually, I just remembered one more issue: null bytes in HTML documents  
> terminate the parser, with no error or warning messages. See the  
> attached test document, which has two paragraphs separated by a null.

  that's gonna be harder to handle, the zero is used in places to
indicate the end of the input buffer... I don't expect something trivial
there.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]