Re: [xml] xmllint: Why does it convert UTF-8 to numeric entity refs?
- From: "Peter Jacobi" <pj walter-graphtek com>
- To: David Sewell <dsewell virginia edu>
- Cc: xml gnome org
- Subject: Re: [xml] xmllint: Why does it convert UTF-8 to numeric entity refs?
- Date: Tue, 05 Aug 2003 12:01:51 +0200
Hi David,
I assume your test input is:
<?xml version="1.0"?>
<!DOCTYPE doc [
<!ELEMENT doc (test)+>
<!ELEMENT test (#PCDATA)>
<!ENTITY ccedil "ç">
<!ATTLIST test lang CDATA #IMPLIED>
]>
<doc>
<test lang="français">UTF-8 character: ç</test>
<test lang="français">numeric ref: ç</test>
<test lang="français">entity ref: ç</test>
</doc>
As you noted,
xmllint --noent test-utf8.xml
gives the output:
<test lang="français">UTF-8 character: ç</test>
<test lang="français">numeric ref: ç</test>
<test lang="français">entity ref: ç</test>
You can solve half of your problem by giving a seemingly redundant option
xmllint --noent --encode test-utf8.xml
gives the output:
<test lang="français">UTF-8 character: ç</test>
<test lang="français">numeric ref: ç</test>
<test lang="français">entity ref: ç</test>
So, if most of your accented character are in PCDATA, this will do it.
ATTN Daniel: Of course it's equivalent from an XML point of view, but
doesn't you find it somewhat disturbing that the occurrence of numerical
entities depend on source charset and a seemingly redundant option?
Regards,
Peter Jacobi
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]