Hi, when I try to parse a document encoded in iso-8859-1 I get a an error message, that the input is not proper UTF-8 although the encoding is declared iso-8859-1: (1194 ~/Drop-Box) xmllint 123103.xml 123103.xml:3: error: Input is not proper UTF-8, indicate encoding ! <title>Israel reagiert mit HÃärte auf Anschläge vom Wochenende</title> ^ 123103.xml:3: error: Bytes: 0xC3 0xC3 0xA4 0x72 <title>Israel reagiert mit HÃärte auf Anschläge vom Wochenende</title> ^ The document contains a very long line (~1750 characters). However the problem does not seem to be connected to this (at least not directly) The problem disapears if - I add NL after the <title> and before the </title> tag - I delete the <head> tag and everything after the title-element (except the closing </nitf>). So somethings seems to go wrong with the character conversion. libxml2 version is 2.4.18, libc is 2.2.4, OS is linux, kernel 2.4.10 I attacted the xml file. greetings Morus PS: I didn't enter the bug in bugzilla, since I didn't see a way, to add a file there, and I think the sample file is important.
Attachment:
123103.xml
Description: Binary data