Re: [xml] Serialization of documents without encoding

Hi Nick,

Nick Wellnhofer wrote:
On 25/09/2018 14:36, Nick Wellnhofer wrote:
The whole situation is a mess. I'd love to change the code so that non-ASCII chars are always encoded as UTF-8, but I'm scared to break things.

Long time ago I did some test with html - .

The case is quite similar - encoding could be defined externally in HTTP header
Content-Type: text/html; charset=ISO8859-5
and in the same time in HTML header (internal)
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-5">
If I remember well (10-15 ago) Internet Explorer prefer internal while other browsers prefer external encoding.

I create similar test to check what is situation with xml and dis some tests (
( browsers - Firefox, Opera, Chromium, Konqueror ).

The test show that all(1) browsers could read xml in following case :
- HTTP header without charset, i.e. Content-Type: text/html;
- XML prolog with encoding, i.e. <?xml version="1.0" encoding="...."?>

Without encoding in prolog only file in UTF-8 codeset could be read (no surprise).

Behavior of some browsers depend from file suffix . This is reason to test to use  .xml and .none suffixes.

Mix between charset and encoding fail as expected exept in case charset=iso8859-1 where some browsers show properly content.

Based on tests I think that switch to UTF-8 encoded content by default is good to have encoding in prolog. It is less risky.

This is the change I have in mind:

Ok to remove "Special escaping routines" but patch shows that in regression tests prolog remains as "<?xmlversion="1.0"?>".
I'm not sure that such code modification is save.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]