Re: [xml] setting the default charset ?



Le sam, jui 28, 2001, à 10:14:38 -0400, Daniel Veillard a écrit:
On Sat, Jul 28, 2001 at 11:23:23AM +0800, William M. Brack wrote:
xmlDetectCharEncoding looks at those four characters, notices that they are
"<!xm", and
then sets the encoding for the context to be UTF8 (somehow this seems
related to the
classic expression of the american Henry Ford, who once declared "You can
have any
colour you like, as long as it's black") (no offense intended).

  It's the normal process as described in the XML specification
there is an appendix about charset detection, though not normative
it's a good idea to follow it. If the application didn't tell 
what encoding is the entity, it's assumed to be UTF8 or UTF16
unless overriden by the encoding decalration in the XML decl.
Not a bug, a feature !

Actually, more probably a bug. Let's see :

        1) I  do a xmlSwitchEncoding(something like KOI8-R);    
        2) then I start parsing.
        3) xmlDetectCharEncoding sees <?xm
        4) it decides <?xm is UTF-8 encoding, so it switches to UTF-8, since
           UTF-8 is not NONE.   
        5) my setting of KOI8-R is forgotten.

<disclaimer>I might read that code differently from gcc.</disclaimer> But I
think William spotted the problem.

What could perhaps be done would be to apply the encoding detected in 4) only if we
detect something else than UTF-8 or we don't have a previous encoding. If we
have a non-default, and we detect a 7 or 8-bit file, then we use the
(previously) user-supplied charset and don't use what we detected. At least
until we find an encoding="..." attribute.

        -- Cyrille

-- 
Grumpf.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]