[xml] encoding problem with iso-8859-1?



Hi,

when I try to parse a document encoded in iso-8859-1 I get a
an error message, that the input is not proper UTF-8 although
the encoding is declared iso-8859-1:
(1194 ~/Drop-Box) xmllint 123103.xml
123103.xml:3: error: Input is not proper UTF-8, indicate encoding !
<title>Israel reagiert mit HÃärte auf Anschläge vom Wochenende</title>
                            ^
123103.xml:3: error: Bytes: 0xC3 0xC3 0xA4 0x72
<title>Israel reagiert mit HÃärte auf Anschläge vom Wochenende</title>
                            ^
The document contains a very long line (~1750 characters).
However the problem does not seem to be connected to this (at least
not directly)
The problem disapears if
- I add NL after the <title> and before the </title> tag
- I delete the <head> tag and everything after the title-element (except
  the closing </nitf>).

So somethings seems to go wrong with the character conversion.

libxml2 version is 2.4.18, libc is 2.2.4, OS is linux, kernel 2.4.10
I attacted the xml file.

greetings
        Morus

PS: I didn't enter the bug in bugzilla, since I didn't see a way,
to add a file there, and I think the sample file is important.

Attachment: 123103.xml
Description: Binary data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]