Re: [xml] HTML-parser: encoding?
- From: Morus Walter <morus walter tanto-xipolis de>
- To: veillard redhat com
- Subject: Re: [xml] HTML-parser: encoding?
- Date: Fri, 30 Nov 2001 08:32:30 +0100
I don't agree with the pre-processing, but _can_ agree with the iconv 
post-processing.  Would be nice if it would be part of xmllint, though...
  What are the range in 00-FF which are not part of ISO 8859-1,
is that just [80-A0[ ?
Actually it's not even [80-A0[.
As I recently learned (from Martin v. Loewis on the python xml mailing list):
*All* bytes are valid charaters in ISO-8859-1 (it is a common
misconception about Latin-1 that 128-159 are not defined).
see
http://208.56.196.240/misc/ISO-8859-1.HTML
so [80-A0[ are valid in ISO-8859-1 though they do not encode 
characters.
They aren't ruled out as characters in xml also
(Char    ::=    #x9 | #xA | #xD | 
               [#x20-#xD7FF] | 
               [#xE000-#xFFFD] | 
               [#x10000-#x10FFFF])
So IMHO one correct way to handle these is just converting them to 
utf8 (if that's the output encoding) or leave them as they are, if the
output encoding is iso-8859-1.
greetings
        Morus
[
Date Prev][
Date Next]   [
Thread Prev][
Thread Next]   
[
Thread Index]
[
Date Index]
[
Author Index]