Re: [xml] Serialization of documents without encoding

From: Roumen Petrov <bugtrack roumenpetrov info>
To: "xml gnome org" <xml gnome org>
Subject: Re: [xml] Serialization of documents without encoding
Date: Sat, 6 Oct 2018 19:32:00 +0300

Hi Nick,

Nick Wellnhofer wrote:

On 25/09/2018 14:36, Nick Wellnhofer wrote:
The whole situation is a mess. I'd love to change the code so thatnon-ASCII chars are always encoded as UTF-8, but I'm scared to breakthings.

Long time ago I did some test with html -http://roumenpetrov.info/tests/charset/ .

The case is quite similar - encoding could be defined externally in HTTPheader

...
Content-Type: text/html; charset=ISO8859-5
...
and in the same time in HTML header (internal)
...
<html>
<head>
....
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-5">
....
</head>
...

If I remember well (10-15 ago) Internet Explorer prefer internal whileother browsers prefer external encoding.

I create similar test to check what is situation with xmlhttp://roumenpetrov.info/tests/charset/index-xml.html and dis some tests (

( browsers - Firefox, Opera, Chromium, Konqueror ).

The test show that all(1) browsers could read xml in following case :
- HTTP header without charset, i.e. Content-Type: text/html;
- XML prolog with encoding, i.e. <?xml version="1.0" encoding="...."?>

Without encoding in prolog only file in UTF-8 codeset could be read (nosurprise).

Behavior of some browsers depend from file suffix . This is reason totest to use .xml and .none suffixes.

Mix between charset and encoding fail as expected exept in casecharset=iso8859-1 where some browsers show properly content.

Based on tests I think that switch to UTF-8 encoded content by defaultis good to have encoding in prolog. It is less risky.

This is the change I have in mind:

https://github.com/nwellnhof/libxml2/commit/53551ec2f6a2ef03bfcfb6d73b6fd18dc70ba15d

Ok to remove "Special escaping routines" but patch shows that inregression tests prolog remains as "<?xmlversion="1.0"?>".

I'm not sure that such code modification is save.

Nick


Regards,
Roumen

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]