Re: [xml] HTML-parser: encoding?

From: Morus Walter <morus walter tanto-xipolis de>
To: veillard redhat com
Subject: Re: [xml] HTML-parser: encoding?
Date: Fri, 30 Nov 2001 08:32:30 +0100

I don't agree with the pre-processing, but _can_ agree with the iconv 
post-processing.  Would be nice if it would be part of xmllint, though...


  What are the range in 00-FF which are not part of ISO 8859-1,
is that just [80-A0[ ?

Actually it's not even [80-A0[.

As I recently learned (from Martin v. Loewis on the python xml mailing list):

*All* bytes are valid charaters in ISO-8859-1 (it is a common
misconception about Latin-1 that 128-159 are not defined).


see
http://208.56.196.240/misc/ISO-8859-1.HTML

so [80-A0[ are valid in ISO-8859-1 though they do not encode 
characters.

They aren't ruled out as characters in xml also
(Char    ::=    #x9 | #xA | #xD | 
               [#x20-#xD7FF] | 
               [#xE000-#xFFFD] | 
               [#x10000-#x10FFFF])

So IMHO one correct way to handle these is just converting them to 
utf8 (if that's the output encoding) or leave them as they are, if the
output encoding is iso-8859-1.

greetings
        Morus

References:
- [xml] HTML-parser: encoding?
  - From: Elizabeth Mattijsen
- Re: [xml] HTML-parser: encoding?
  - From: Melvyn Sopacua
- Re: [xml] HTML-parser: encoding?
  - From: Elizabeth Mattijsen
- Re: [xml] HTML-parser: encoding?
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]