Re: [xml] Serialization of documents without encoding
- From: Nick Wellnhofer <wellnhofer aevum de>
- To: Roumen Petrov <bugtrack roumenpetrov info>, "xml gnome org" <xml gnome org>
- Subject: Re: [xml] Serialization of documents without encoding
- Date: Thu, 27 Sep 2018 14:22:55 +0200
On 27/09/2018 10:59, Roumen Petrov wrote:
Let consider case as "file" mode.
Let consider case as "stream" code.
I'm not only talking about xmllint but the serialization API (xmlSave*,
xmlNodeDump*) in general.
Now about above test samples . if content is stored in file xmllint works fine
with encoding(=codeset=charset).
$ cat test-noencoding.xml
<?xml version="1.0"?><doc>Käse</doc>
No, it doesn't work fine:
$ xmllint test-noencoding.xml
<?xml version="1.0"?>
<doc>Käse</doc>
(2) Next a-umlaut character is encoded in hexadecimal. Minor inconsistency
between "stream" and "file" mode.
As shown above, "file" mode can also produce unwanted numeric character
references.
(3) Problem is that in "scream" mode xmllint application ignores value of
encode argument:
$ echo '<?xml version="1.0"?><doc>Käse</doc>' | xmllint - --encode UTF-8
<?xml version="1.0"?>
<doc>Käse</doc>
Right, there is an inconsistency in xmllint. But that's not my point.
From my point of view (1) and (2) are minor non-important issues. Only (3)
could be fixed with low priority.
Unneeded numeric character references in UTF-8 output are not a minor issue.
If you're working with non-Latin scripts, it makes serialized XML files
unreadable for humans and blows up the file size.
Nick
[
Date Prev][
Date Next] [
Thread Prev][Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]