Re: [xml] setting the default charset ?



On Mon, Jul 30, 2001 at 02:45:14PM -0400, Liam Quin wrote:
On Mon, Jul 30, 2001 at 04:57:18PM +0200, Thomas Broyer wrote:
Some servers recode files on the fly, they do not modify their content (so
don't modify the encoding declaration).
That's the reason why the encoding declaration at protocol level is
authoritative.
And some servers always send an 8859-1 encoding http header even when it's
incorrect -- this was a major problem with HTML, and we tried hard to
make it not a problem with XML.

The URL Daniel referenced,
http://www.w3.org/TR/html4/charset.html#spec-char-encoding
is for the HTML spec and does not apply to XML.

  Liam, repectfully I will point out that the file I gave an URL for
is http://www.ietf.org/rfc/rfc2376.txt, the normative reference made from
the XML spec second edition (I understand that a revision has superceeded
this RFC, but I would be surprized they changed it)

  3.1 Text/xml Registration

   Optional parameters: charset

   4th paragraph:

-----------------------
Since the charset parameter is authoritative, the charset is not
always declared within an XML encoding declaration.  Thus, special
care is needed when the recipient strips the MIME header and
provides persistent storage of the received XML entity (e.g., in a
file system). Unless the charset is UTF-8 or UTF-16, the recipient
SHOULD also persistently store information about the charset,
perhaps by embedding a correct XML encoding declaration within the
XML entity.
-----------------------

  I see 'charser parameter is authoritative' and a 'should' for
the XML encoding declaration. That's why I was worried. I initially expected
to be able to rebutt the statement made that the HTTP encoding charset
should override the XML decl but reading this it seems not the case
hence why I was really unpleased. I think we discussed this at the
last XML Core f2f but I can't remember the outcome (and still too tired
to check this now).

The encoding declaration in XML must override any other.

  I agree that's the way things should have been specified it seems
not the case.

This does become a problem for XHTML, where maybe we have
conflicting rules.  But anything translating the encoding of
an XML document must change the XML declaration.  ANy other
behaviour is broken.

  Yes but ... Probably worth chasing the latest RFC from Murata
and Simon St Laurent to see if this has been changed. I have little hope.

Daniel

-- 
Daniel Veillard      | Red Hat Network http://redhat.com/products/network/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]