Re: [xml] Serialization of documents without encoding
- From: Roumen Petrov <bugtrack roumenpetrov info>
- To: "xml gnome org" <xml gnome org>
- Subject: Re: [xml] Serialization of documents without encoding
- Date: Sat, 6 Oct 2018 19:32:00 +0300
Hi Nick,
Nick Wellnhofer wrote:
On 25/09/2018 14:36, Nick Wellnhofer wrote:
The whole situation is a mess. I'd love to change the code so that
non-ASCII chars are always encoded as UTF-8, but I'm scared to break
things.
Long time ago I did some test with html -
http://roumenpetrov.info/tests/charset/ .
The case is quite similar - encoding could be defined externally in HTTP
header
...
Content-Type: text/html; charset=ISO8859-5
...
and in the same time in HTML header (internal)
...
<html>
<head>
....
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-5">
....
</head>
...
If I remember well (10-15 ago) Internet Explorer prefer internal while
other browsers prefer external encoding.
I create similar test to check what is situation with xml
http://roumenpetrov.info/tests/charset/index-xml.html and dis some tests (
( browsers - Firefox, Opera, Chromium, Konqueror ).
The test show that all(1) browsers could read xml in following case :
- HTTP header without charset, i.e. Content-Type: text/html;
- XML prolog with encoding, i.e. <?xml version="1.0" encoding="...."?>
Without encoding in prolog only file in UTF-8 codeset could be read (no
surprise).
Behavior of some browsers depend from file suffix . This is reason to
test to use .xml and .none suffixes.
Mix between charset and encoding fail as expected exept in case
charset=iso8859-1 where some browsers show properly content.
Based on tests I think that switch to UTF-8 encoded content by default
is good to have encoding in prolog. It is less risky.
This is the change I have in mind:
https://github.com/nwellnhof/libxml2/commit/53551ec2f6a2ef03bfcfb6d73b6fd18dc70ba15d
Ok to remove "Special escaping routines" but patch shows that in
regression tests prolog remains as "<?xmlversion="1.0"?>".
I'm not sure that such code modification is save.
Nick
Regards,
Roumen
[Date Prev][
Date Next] [Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]