Re: [xml] setting the default charset ?
- From: Cyrille Chepelov <chepelov calixo net>
- To: Daniel Veillard <veillard redhat com>
- Cc: Cyrille Chepelov <chepelov calixo net>, "William M. Brack" <wbrack mmm com hk>, xml gnome org
- Subject: Re: [xml] setting the default charset ?
- Date: Sat, 28 Jul 2001 19:53:10 +0200
Le sam, jui 28, 2001, à 01:13:39 -0400, Daniel Veillard a écrit:
On Sat, Jul 28, 2001 at 06:50:54PM +0200, Cyrille Chepelov wrote:
Right, enclosed patch should fix the problem,
Mmmmh, yes, it looks like it fixes my problem. However, it's possible it
brings a deviation from standard or reasonible behaviour:
1) I call xmlSwitchEncoding(KOI8-R);
2) I start parsing a file
3) That file happens to be encoded in UTF-16. Now, xmlParserHandlePEReference
won't attempt to detect the encoding, since it sees we start using KOI8-R.
4) bang !
What should probably be done instead (I haven't actually tested this):
/*
* Get the 4 first bytes and decode the charset
* if enc != XML_CHAR_ENCODING_NONE
* plug some encoding conversion routines (unless the user already
* provided a sensible encoding).
*/
start[0] = RAW;
start[1] = NXT(1);
start[2] = NXT(2);
start[3] = NXT(3);
enc = xmlDetectCharEncoding(start, 4);
switch (enc):
case XML_CHAR_ENCODING_NONE:
break;
case XML_CHAR_ENCODING_UTF8:
if (ctxt->encoding == XML_CHAR_ENCODING_NONE) { /* default 8-bit behaviour */
xmlSwitchEncoding(ctxt, enc);
}
break;
default: /* EBCDIC, UTF16 and UCS4 */
xmlSwitchEncoding(ctxt, enc);
break;
}
Basically, this should work as the code before your patch, except that if we
detect an asciioid encoding *and* the user supplied us a better default than
UTF-8, then we honour that. Otherwise, if we detect a strange beast, we
adapt right away to the strange beast.
-- Cyrille
--
Grumpf.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]