Re: [xml] HTML-parser: encoding?
- From: Daniel Veillard <veillard redhat com>
- To: Elizabeth Mattijsen <liz dijkmat nl>
- Cc: Melvyn Sopacua <mdev idg nl>, xml gnome org
- Subject: Re: [xml] HTML-parser: encoding?
- Date: Thu, 29 Nov 2001 17:25:40 -0500
On Thu, Nov 29, 2001 at 10:58:01PM +0100, Elizabeth Mattijsen wrote:
At 09:27 PM 11/29/01 +0100, Melvyn Sopacua wrote:
As I understand it, you buffer the incoming streams, which allows you to
build an encoding map, which can be as simple as a textfile consisting of
"filename","encoding"\n.
If the HTTP headers don't provide information, the meta tags can be read.
That's what the HTML-parser is already doing, isn't it?
yep
Changing it to numeric entities would actually be best, as it wouldn't lose
any information. Hmm... but can you actually do that? Wouldn't the next
time you read this into an xml parser, re-create the encoding error again
(having the entity processed)?
no it would be fine as long as they are in the ranges defined by
XML as valid Chars.
Hmmm... availability of iconv is not guaranteed on a lot of platforms,
isn't it? Which would be a reason for me not to use this, as the script
should be generally usable.
Iconv is actually availble in most platform. if needed after a recompile.
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]