Re: [xml] setting the default charset ?



Le ven, jui 27, 2001, à 09:55:58 -0400, Daniel Veillard a écrit:
On Fri, Jul 27, 2001 at 08:20:21AM +0200, Cyrille Chepelov wrote:
    When libxml2 doesn't see the encoding="..." attribute, it defaults
to either UTF-8 or ASCII-7 (I don't remember which one), which in either

  UTF8 or UTF-16, or complains and use ISO-Latin-1. The latter is actually
a violation of the spec it should abort at that point.

So, at the worst case, we could pass the older files through iconv() to make
sure they're UTF-8 and let libxml2 handle the result.

  If you know the encoding, it's still okay per the XML spec to start the
parser telling to use that encoding.
  Libxml2 suppports this kind of operations but it seems I don't export 
a clear API for it there is for example an entry point to create an HTML
parser context in those condition but not one for XML
  http://xmlsoft.org/html/libxml-parserinternals.html#HTMLCREATEFILEPARSERCTXT

Yes ; I know the default encoding: it's whatever nl_langinfo(CODESET) says
(but I'd prefer to have to tell libxml myself rather than having libxml try
to guess). Of course, if it turns out the document I'm feeding to the parser
has an encoding specification, that specification shall override the default
I would give through the (still) hypothetical API.

Something like 
        int xmlSetParserEncoding(xmlParserCtxPtr ctxt,
                                const char *encoding);
would be nice. (I initially thought that would be what xmlSwitchEncoding()
was supposed to do, but it didn't quite work. And I'm afraid I don't really
understand what the libxml-parserinternals page says on this function).

Perhaps even a shorthand like
        xmlDocPtr xmlParseFileWithEncoding(const char *filename,
                                           const char *default_encoding);
 (with default_encoding == NULL means "do like xmlParseFile() did") could be
useful (well, in my case, it certainly would).

  this could be easilly added.

That would definitely be helpful !

        -- Cyrille

-- 
Grumpf.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]