Re: [xml] setting the default charset ?



Le sam, jui 28, 2001, à 01:13:39 -0400, Daniel Veillard a écrit:
On Sat, Jul 28, 2001 at 06:50:54PM +0200, Cyrille Chepelov wrote:

  Right, enclosed patch should fix the problem,

Mmmmh, yes, it looks like it fixes my problem. However, it's possible it
brings a deviation from standard or reasonible behaviour:
        1) I call xmlSwitchEncoding(KOI8-R);
        2) I start parsing a file
        3) That file happens to be encoded in UTF-16. Now, xmlParserHandlePEReference
        won't attempt to detect the encoding, since it sees we start using KOI8-R.
        4) bang !

What should probably be done instead (I haven't actually tested this):

        /* 
         * Get the 4 first bytes and decode the charset
         * if enc != XML_CHAR_ENCODING_NONE
         * plug some encoding conversion routines (unless the user already
         * provided a sensible encoding).
         */
        start[0] = RAW;
        start[1] = NXT(1);
        start[2] = NXT(2);
        start[3] = NXT(3);
        enc = xmlDetectCharEncoding(start, 4);  
        switch (enc):
        case XML_CHAR_ENCODING_NONE:
          break;
        case XML_CHAR_ENCODING_UTF8:
          if (ctxt->encoding == XML_CHAR_ENCODING_NONE) { /* default 8-bit behaviour */
            xmlSwitchEncoding(ctxt, enc);
          }
          break;
        default: /* EBCDIC, UTF16 and UCS4 */
           xmlSwitchEncoding(ctxt, enc);
           break;
        }
           
Basically, this should work as the code before your patch, except that if we
detect an asciioid encoding *and* the user supplied us a better default than
UTF-8, then we honour that. Otherwise, if we detect a strange beast, we
adapt right away to the strange beast.      

         -- Cyrille

-- 
Grumpf.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]