[xml] magic characters make the HTML parser lose data

From: Aaron Patterson <aaron patterson gmail com>
To: xml gnome org
Subject: [xml] magic characters make the HTML parser lose data
Date: Tue, 21 Apr 2009 14:42:30 -0700

Hi,

One of my users has run in to a problem where the HTML parser will
lose all data after a particular sequence of characters in the HTML
body.  It seems that if there are two characters, 0x01 followed by
0x00, the HTML parser will loose all data after those two characters
even if the parser is put in recovery mode.

Here is a program and test file that reproduce the problem:

  http://gist.github.com/99401

I realize those characters are not valid UTF-8 characters, but it
seems that if the parser is in recovery mode it shouldn't lose all
data after them.  Shall I file a ticket in bugzilla?

-- 
Aaron Patterson
http://tenderlovemaking.com/

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]