[xml] Bug in HTMLparser.c


I might have stumbled upon a bug in HTMLparser.c. This bug manifests itself when
a UTF-8 HTML file is being read and a Unicode character gets split right at the
end of the input buffer; the input buffer then gets resized and but an old
pointer is used and the Unicode character is not recognized. This causes the
encoding to be switched back to ISO-8859-1.

I found this bug while using xsltproc with --html, and I don't know under what
other circumstances it may arise.

The attached patch solves the problem by updating the cur pointer in the
htmlCurrentChar function whenever xmlParserInputGrow is called.

Thanks for Libxml2 and Libxslt, I have used them in several occasions and
they've been really helpful.

Adiel Mittmann

Attachment: libxml2-2.6.32-split-utf8.patch
Description: Text document

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]