[xml] Bug with encoding of external parameter entities?



Hi,

I had a look on libxslt and found a problem which seems to originate
from libxml.

When I have a document and a external dtd in ISO-8859-1 encoding and
declare an entity to be an 8bit character, the instanciation of the
entity gives a wrong result.

E.g. let
----
<?xml version="1.0" encoding="ISO-8859-1"?>
<!ENTITY test '¿'>
----
be the external dtd 'ent.dtd', and
----
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE test SYSTEM "ent.dtd" [
<!ELEMENT test (#PCDATA)>
]>
<test>&test; ¿</test>
----
some xml document (doc.xml), 
xmllint --valid --noent --encode "ISO-8859-1" doc.xml | od -c 
gives
0000000   <   ?   x   m   l       v   e   r   s   i   o   n   =   "   1
0000020   .   0   "       e   n   c   o   d   i   n   g   =   "   I   S
0000040   O   -   8   8   5   9   -   1   "   ?   >  \n   <   !   D   O
0000060   C   T   Y   P   E       t   e   s   t       S   Y   S   T   E
0000100   M       "   e   n   t   .   d   t   d   "       [  \n   <   !
0000120   E   L   E   M   E   N   T       t   e   s   t       (   #   P
0000140   C   D   A   T   A   )   >  \n   ]   >  \n   <   t   e   s   t
0000160   > 200     277   <   /   t   e   s   t   >  \n

(I did the octal dump to show that the &test; entity is substituted by
200o rather than 277o which is ¿).

The same problem shows up for other 8bit charaters used in the external
file entity.
So I supect libxml to handle the encoding declaration of the external entity
in a incorrect way.

Any comments?

I use libxml2 2.3.4 on linux.

greetings
        Morus




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]