[xml] characters callback called twice (and UTF-8?)
- From: bagnacauda <bagnacauda gmail com>
- To: xml gnome org
- Subject: [xml] characters callback called twice (and UTF-8?)
- Date: Fri, 16 May 2008 12:37:12 +0200
Hello,
I need your help to understand what follows.
I have this xml file (you can find it attached) whose tag may contain western European, Russian or Greek characters, even mixed among them.
I have run xmllint --debug ?sax on the file to see if everything is OK when I get a mixed character string and I was surprised to see that the characters callback is invoked twice: once for the first four characters (which are western european) and once for the remaining part of the string (Russian).
Output of xmllint is as follows:
SAX.setDocumentLocator()
SAX.startDocument()
SAX.startElementNs(tag1, NULL, NULL, 2, xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance', xmlns:xsd='http://www.w3.org/2001/XMLSchema', 5, 0, xsi:noNamespaceSchemaLocation='myxs...', 9, Version='1.2"...', 3, CreationDate='2007...', 10, CreationTime='17:0...', 8, CreationTimeOffset='+01"...', 3)
SAX.characters(
, 3)
SAX.startElementNs(tag2, NULL, NULL, 0, 0, 0)
SAX.characters(
, 5)
SAX.startElementNs(tag3, NULL, NULL, 0, 0, 0)
SAX.characters(AAAA, 4)
SAX.characters(закончилась, 22)
SAX.endElementNs(tag3, NULL, NULL)
SAX.characters(
, 3)
SAX.endElementNs(tag2, NULL, NULL)
SAX.characters(
, 1)
SAX.endElementNs(tag1, NULL, NULL)
SAX.endDocument()
This does not happen neither when I move the first four characters to the end of the string nor when I move them to the middle.
I have searched the maling list for some similar case as well as the xmlsoft website and other resources but honestly I am still puzzled by the behaviour of the parser.
Am I overlooking something?
Best regards.
Massimo Comba
Attachment:
myfile.xml
Description: Text Data
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]