[xml] magic characters make the HTML parser lose data



Hi,

One of my users has run in to a problem where the HTML parser will
lose all data after a particular sequence of characters in the HTML
body.  It seems that if there are two characters, 0x01 followed by
0x00, the HTML parser will loose all data after those two characters
even if the parser is put in recovery mode.

Here is a program and test file that reproduce the problem:

  http://gist.github.com/99401

I realize those characters are not valid UTF-8 characters, but it
seems that if the parser is in recovery mode it shouldn't lose all
data after them.  Shall I file a ticket in bugzilla?

-- 
Aaron Patterson
http://tenderlovemaking.com/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]