"Re: [xml] DTD - external subset - encoding"



Hi,

on 2/23/2004 1:44 PM Kasimier Buchcik wrote:

Hi,

I have a XML document that references an external subset. Both are 
encoded in UTF-16. Xmllint seems to choke on the external subset file.
I haven't found anything about encoding problems with external subsets 
in the mail archives & bug list.


C:\dev\libxml2\lib\xml-2-6-6-xslt-1-1-2>xmllint parament_test.xml 
--valid --noent
parament_test.dtd:1: parser error : internal error
 ?<
^
parament_test.dtd:1: parser error : DOCTYPE improperly terminated
 ?<
^
parament_test.dtd:1: parser error : Input is not proper UTF-8, indicate 
encoding !
 ?<
^
parament_test.dtd:1: error: Bytes: 0xFF 0xFE 0x3C 0x00
 ?<
^
parament_test.dtd:1: parser error : Start tag expected, '<' not found
 ?<
  ^

C:\dev\libxml2\lib\xml-2-6-6-xslt-1-1-2>xmllint --version
xmllint: using libxml version 20606
    compiled with: DTDValid FTP HTTP HTML C14N Catalog XPath XPointer 
XInclude Iconv Unicode Regexps Automata Schemas

I'm working on a w2k machine.
I have attached the test used.

After some debugging I stranded in "xmlParserHandlePEReference" where a 
switch of encoding could be done; but the test fails, since "entity-> 
length" seems to be zero. I also have expected the "input" to be tested 
for length >= 4 and not the "entity" - but this one is zero as well. 
Since I don't know what length to use or how to fix this, I'm just able 
to point this out:


parser.c (xmlParserHandlePEReference)

/*
                     * Get the 4 first bytes and decode the charset
                     * if enc != XML_CHAR_ENCODING_NONE
                     * plug some encoding conversion routines.
                     */
                    GROW
                    if (entity->length >= 4) {   <<<----- HERE
                        start[0] = RAW;
                        start[1] = NXT(1);
                        start[2] = NXT(2);
                        start[3] = NXT(3);
                        enc = xmlDetectCharEncoding(start, 4);
                        if (enc != XML_CHAR_ENCODING_NONE) {
                            xmlSwitchEncoding(ctxt, enc);
                        }
                    }


Regards,

Kasimier




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]