[xml] xmllint: Why does it convert UTF-8 to numeric entity refs?
- From: David Sewell <dsewell virginia edu>
- To: xml gnome org
- Subject: [xml] xmllint: Why does it convert UTF-8 to numeric entity refs?
- Date: Mon, 4 Aug 2003 12:22:03 -0400 (EDT)
Hi,
I'd like to use xmllint to do checking and formatting of XML files
prior to editing. However, I've found a minor issue in the way it
handles UTF-8 characters, and I'm wondering if there is a workaround or
if I just missed something in the docs.
Problem: xmllint converts native UTF-8 characters to numeric character
references when an XML document is in UTF-8 encoding.
For example: given the following file "test-utf8.xml":
<?xml version="1.0"?>
<!DOCTYPE doc [
<!ELEMENT doc (test)+>
<!ELEMENT test (#PCDATA)>
<!ENTITY ccedil "ç">
<!ATTLIST test lang CDATA #IMPLIED>
]>
<doc>
<test lang="français">UTF-8 character: ç</test>
<test lang="français">numeric ref: ç</test>
<test lang="français">entity ref: ç</test>
</doc>
(where the 2-byte UTF-8 character is "ç"), running "xmllint
--noent test-utf8.xml" converts the <test> elements to:
<test lang="français">UTF-8 character: ç</test>
<test lang="français">numeric ref: ç</test>
<test lang="français">entity ref: ç</test>
Is there any way to preserve the UTF-8 output?
I notice that by contrast, if I have an identical "test.xml" file
encoded in ISO-8859-1, then "xmllint --noent test-utf8.xml"
produces
<test lang="français">UTF-8 character: ç</test>
<test lang="français">numeric ref: ç</test>
<test lang="français">entity ref: ç</test>
preserving the Latin-1 character. Why this inconsistency?
(It would be nice to have an optional flag to xmllint allowing a choice
of output between characters and numeric refs.)
Thanks for any illumination,
David Sewell
--
David Sewell, Managing Editor
Electronic Imprint, The University of Virginia Press
PO Box 400318, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: dsewell virginia edu Tel: +1 434 924 9973
Web: http://www.ei.virginia.edu/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]