Re: [xml] UTF-8 decoding bug in HTML parser



Hi Daniel,

  See patch attached, i'm commiting it to SVN as this fixes the specific
test case, all the errors seen when parsing subsequently looks 'normal'
:-) so I added it to the test suite

Excellent!

Would there be any chance that you could look at one more related issue affecting the HTML parser? Currently if a HTML file begins with a UTF-8 BOM, the HTML parser does not recognise it and parses it as three Latin1 characters, which results in garbage at the beginning of the file and an incorrect encoding for the rest of the file.

Would it be possible to skip over these three bytes, and ideally set the encoding to UTF-8 if they are present?

Best regards,

Michael

--
Print XML with Prince!
http://www.princexml.com



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]