Re: [xml] setting the default charset ?
- From: Cyrille Chepelov <chepelov calixo net>
- To: Daniel Veillard <veillard redhat com>
- Cc: Cyrille Chepelov <chepelov calixo net>, xml gnome org
- Subject: Re: [xml] setting the default charset ?
- Date: Fri, 27 Jul 2001 20:13:01 +0200
Le ven, jui 27, 2001, à 01:36:57 -0400, Daniel Veillard a écrit:
On Fri, Jul 27, 2001 at 06:49:25PM +0200, Cyrille Chepelov wrote:
So, at the worst case, we could pass the older files through iconv() to make
sure they're UTF-8 and let libxml2 handle the result.
Well if you use the libxml2 framework, it will be done progtressively
this was just a contingency solution ; I prefer to cooperate with my libs,
not circumvent them :-)
Libxml will never look at locales, I garantee this !
Good ! (and yes, I definitely agree: there are so many ways of looking at
locales... the application knows better).
Something like
int xmlSetParserEncoding(xmlParserCtxPtr ctxt,
const char *encoding);
would be nice. (I initially thought that would be what xmlSwitchEncoding()
was supposed to do, but it didn't quite work. And I'm afraid I don't really
understand what the libxml-parserinternals page says on this function).
xmlSwitchEncoding will put an iconv filer for this encoding between your
source and the parser, more precisely a encoder from this encoding to
UTF8
So, in theory, that should be OK ?
in the meantime use xmlSwitchEncoding().
I tried: this failed (here's the code snippet I used, raw with the comments
:-) I had delayed a bit since I wrote the comments, because I hoped to see
you in Bordeaux -- your badge was there, but I don't know whether you met it)
/* int get_local_charset(const char **charset) returns TRUE if the local
charset is UTF-8, FALSE otherwise. charset is filled with the correct
character set information (usually from nl_langinfo(CODESET) but also from
what libunicode says when it's not broken). */
xmlDocPtr
xmlDiaParseFile(const char *filename) {
/* Copied from libxml 2.3.9's xmlSAXParseFile function.
written by Daniel Veillard w3 org, then modified for dia's purpose
by Cyrille Chepelov */
xmlDocPtr ret;
xmlParserCtxtPtr ctxt;
char *directory = NULL;
char *local_charset = NULL;
ctxt = xmlCreateFileParserCtxt(filename);
if (ctxt == NULL) {
return(NULL);
}
if ((ctxt->directory == NULL) && (directory == NULL))
directory = xmlParserGetDirectory(filename);
if ((ctxt->directory == NULL) && (directory != NULL))
ctxt->directory = (char *) xmlStrdup((xmlChar *) directory);
#ifdef XML2
#if 1 /* This doesn't work. In fact, libxml seems to do just whatever it
pleases wrt charsets (it *seems* to do the right thing when loading
older 8859-1 diagrams. I really don't know whether it'll load
correctly non-8859-1 diagrams !). If it doesn't, I see two courses
of action:
1) ask Daniel Veillard for help.
2) run a quick and dirty zcat|sed job to put the encoding
of the dia file on the fly into a temporary, and then
load that temp. instead of the real file.
For now I'll hope everything will happen alright -- CC */
if (!get_local_charset(&local_charset)) {
/* local charset is not UTF-8. We switch at first to local encoding,
libxml will switch back to another encoding if necessary and
present in the XML file. */
xmlCharEncoding enc = xmlParseCharEncoding(local_charset);
if (enc != XML_CHAR_ENCODING_ERROR) {
xmlSwitchEncoding(ctxt,enc);
} else {
xmlSwitchEncoding(ctxt, XML_CHAR_ENCODING_8859_1);
g_warning("local encoding %s unsupported by libxml; will use 8859-1
"
"as default.", local_charset);
}
} else {
xmlSwitchEncoding(ctxt, XML_CHAR_ENCODING_UTF8);
}
#endif
#else
#ifdef UNICODE_WORK_IN_PROGRESS
#error "We can't make this work without libxml2."
#endif
#endif
xmlParseDocument(ctxt);
if ((ctxt->wellFormed)) ret = ctxt->myDoc;
else {
ret = NULL;
xmlFreeDoc(ctxt->myDoc);
ctxt->myDoc = NULL;
}
xmlFreeParserCtxt(ctxt);
return(ret);
}
(yes, the XML2 symbol is defined).
Should this bit of code work ? If it should, then I'll commit it "as is",
and see whether I hear screams from Eastern Europe...
(well, usually nobody really cares about what's committed in dia's tree.
Only after two releases are made, bug reports begin flowing in).
-- Cyrille
--
Grumpf.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]