[xml] --html option breaks encoding



Hello,

I am trying to get an epub out of a html documentation, after some XPath queries. But XPath requires 
perfectly valid xml, so I tried to format it with xmllint. Then if I'm not mistaken, the --html option breaks 
the encoding.

Without --html :

$ echo '<title>Introduction — Vue.js</title>' |  xmllint --encode UTF-8 --format  -

[...]
<title>Introduction — Vue.js</title>



With --html (seems to be required for entire documents). The "—" is transformed to "&acirc;&#128;&#148;" :

$ echo '<title>Introduction — Vue.js</title>' |  xmllint --html --htmlout --encode UTF-8 --format -

[...]
<title>Introduction &acirc;&#128;&#148; Vue.js</title>



$ xmllint --version 
xmllint: using libxml version 20904
   compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N 
Catalog XPath XPointer XInclude Iconv ICU ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules 
Debug Zlib Lzma


Thanks in advance.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]