Re: [xml] Serialization of documents without encoding



On Tue, Sep 25, 2018 at 01:19:51PM +0200, Nick Wellnhofer wrote:
libxml2 serializes documents without an encoding declaration differently
than documents with an explicit UTF-8 encoding:

$ echo '<?xml version="1.0"?><doc>Käse</doc>' |xmllint -
<?xml version="1.0"?>
<doc>K&#xE4;se</doc>

$ echo '<?xml version="1.0" encoding="utf-8"?><doc>Käse</doc>' |xmllint -
<?xml version="1.0" encoding="utf-8"?>
<doc>Käse</doc>

Since the encoding should default to UTF-8, can anyone explain why this
decision was made?

  Because using the codepoint is part of the core XML spec, there is no
way this can be screwed up when people are doing manipulations like
cutting parts of an XML document, pasting it somewhere else where the
context may be differemt. So if you don't explicitely ask for an encoding
libxml2 will deliver the most resilient serialization possible and that
means using codepoint, except where not possible (and then specifics about
attributes serialization, etc ...)
  Please keep it that way, you have no idea what people may have done
and unless this really fixes an issue I would be very reluctant to change
this behaviour.

 thanks,

Daniel

-- 
Daniel Veillard      | Red Hat Developers Tools http://developer.redhat.com/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]