Re: [xml] HTML-parser: encoding?
- From: Daniel Veillard <veillard redhat com>
- To: Elizabeth Mattijsen <liz dijkmat nl>
- Cc: Melvyn Sopacua <mdev idg nl>, xml gnome org
- Subject: Re: [xml] HTML-parser: encoding?
- Date: Thu, 29 Nov 2001 16:12:17 -0500
On Thu, Nov 29, 2001 at 08:51:09PM +0100, Elizabeth Mattijsen wrote:
At 07:01 PM 11/29/01 +0100, Melvyn Sopacua wrote:
At 15:52 11/29/2001 +0100, you wrote:
I would propose that _if_ the HTML-parser is used _and_ there is _no_
encoding specification found, that libxml _then_ would check all of the
text in the tree for characters illegal for the ISO-Latin-1 encoding and
replace these with spaces (so that the size of the buffer used is not changed).
Personally, I think that would be quite expensive...
Expensive in what way? I always thought that libxml was made for complete
functionality, not speed. And it would only happen _if_ you are using the
HTML-parser _and_ no encoding information was found.
Actually all characters are already been tested
Or maybe xmllint could need an extra parameter to transform any characters
not legal in the encoding of the document, to be replaced by another
character. That would make it more general...
I don't like the idea to replace the character with something else.
Either there one detect a problem and raise an error, possibly removing
information, but the idea of silently changing the content is not something
I support. This kind of kludges becodes a real pain once such a behaviour
is burried inside a large software and one start wondering why the output
is not the one expected by the input.
The best way in the case of the default fallback to ISO-Latin-1 is to
reencode the characters as character references and let the downstream
deal with them.
I don't agree with the pre-processing, but _can_ agree with the iconv
post-processing. Would be nice if it would be part of xmllint, though...
What are the range in 00-FF which are not part of ISO 8859-1,
is that just [80-A0[ ?
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]