Re: [xml] HTML-parser: encoding?



On Thu, Nov 29, 2001 at 10:58:01PM +0100, Elizabeth Mattijsen wrote:
At 09:27 PM 11/29/01 +0100, Melvyn Sopacua wrote:
 As I understand it, you buffer the incoming streams, which allows you to 
build an encoding map, which can be as simple as a textfile consisting of 
"filename","encoding"\n.
If the HTTP headers don't provide information, the meta tags can be read.

That's what the HTML-parser is already doing, isn't it?

  yep

Changing it to numeric entities would actually be best, as it wouldn't lose 
any information.  Hmm...  but can you actually do that?  Wouldn't the next 
time you read this into an xml parser, re-create the encoding error again 
(having the entity processed)?

  no it would be fine as long as they are in the ranges defined by 
XML as valid Chars.

Hmmm... availability of iconv is not guaranteed on a lot of platforms, 
isn't it?  Which would be a reason for me not to use this, as the script 
should be generally usable.

  Iconv is actually availble in most platform. if needed after a recompile.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]