Re: [xml] setting the default charset ?
- From: Cyrille Chepelov <chepelov calixo net>
- To: xml gnome org
- Cc: veillard redhat com
- Subject: Re: [xml] setting the default charset ?
- Date: Fri, 27 Jul 2001 18:49:25 +0200
Le ven, jui 27, 2001, à 09:55:58 -0400, Daniel Veillard a écrit:
On Fri, Jul 27, 2001 at 08:20:21AM +0200, Cyrille Chepelov wrote:
When libxml2 doesn't see the encoding="..." attribute, it defaults
to either UTF-8 or ASCII-7 (I don't remember which one), which in either
UTF8 or UTF-16, or complains and use ISO-Latin-1. The latter is actually
a violation of the spec it should abort at that point.
So, at the worst case, we could pass the older files through iconv() to make
sure they're UTF-8 and let libxml2 handle the result.
If you know the encoding, it's still okay per the XML spec to start the
parser telling to use that encoding.
Libxml2 suppports this kind of operations but it seems I don't export
a clear API for it there is for example an entry point to create an HTML
parser context in those condition but not one for XML
Yes ; I know the default encoding: it's whatever nl_langinfo(CODESET) says
(but I'd prefer to have to tell libxml myself rather than having libxml try
to guess). Of course, if it turns out the document I'm feeding to the parser
has an encoding specification, that specification shall override the default
I would give through the (still) hypothetical API.
int xmlSetParserEncoding(xmlParserCtxPtr ctxt,
const char *encoding);
would be nice. (I initially thought that would be what xmlSwitchEncoding()
was supposed to do, but it didn't quite work. And I'm afraid I don't really
understand what the libxml-parserinternals page says on this function).
Perhaps even a shorthand like
xmlDocPtr xmlParseFileWithEncoding(const char *filename,
const char *default_encoding);
(with default_encoding == NULL means "do like xmlParseFile() did") could be
useful (well, in my case, it certainly would).
this could be easilly added.
That would definitely be helpful !
] [Thread Prev