Re: [xml] Serialization of documents without encoding



On 27/09/2018 10:59, Roumen Petrov wrote:
Let consider case as "file" mode.

Let consider case as "stream" code.

I'm not only talking about xmllint but the serialization API (xmlSave*, xmlNodeDump*) in general.

Now about above test samples . if content is stored in file xmllint works fine with encoding(=codeset=charset).

$ cat test-noencoding.xml
<?xml version="1.0"?><doc>Käse</doc>

No, it doesn't work fine:

$ xmllint test-noencoding.xml
<?xml version="1.0"?>
<doc>K&#xE4;se</doc>

(2) Next a-umlaut character is encoded in hexadecimal. Minor inconsistency between "stream" and "file" mode.

As shown above, "file" mode can also produce unwanted numeric character references.

(3) Problem is that in "scream" mode xmllint application ignores value of encode argument:
$ echo '<?xml version="1.0"?><doc>Käse</doc>' | xmllint - --encode UTF-8
<?xml version="1.0"?>
<doc>K&#xE4;se</doc>

Right, there is an inconsistency in xmllint. But that's not my point.

From my point of view (1) and (2) are minor non-important issues. Only (3) could be fixed with low priority.

Unneeded numeric character references in UTF-8 output are not a minor issue. If you're working with non-Latin scripts, it makes serialized XML files unreadable for humans and blows up the file size.

Nick



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]