[xml] characters callback called twice (and UTF-8?)



Hello,

 

I need your help to understand what follows.

 

I have this xml file (you can find it attached) whose tag may contain western European, Russian or Greek characters, even mixed among them.

I have run xmllint --debug ?sax on the file to see if everything is OK when I get a mixed character string and I was surprised to see that the characters callback is invoked twice: once for the first four characters (which are western european) and once for the remaining part of the string (Russian).
Output of xmllint is as follows:
 
SAX.setDocumentLocator()
SAX.startDocument()
SAX.startElementNs(tag1, NULL, NULL, 2, xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance', xmlns:xsd='http://www.w3.org/2001/XMLSchema', 5, 0, xsi:noNamespaceSchemaLocation='myxs...', 9, Version='1.2"...', 3, CreationDate='2007...', 10, CreationTime='17:0...', 8, CreationTimeOffset='+01"...', 3)
SAX.characters(
  , 3)
SAX.startElementNs(tag2, NULL, NULL, 0, 0, 0)
SAX.characters(
    , 5)
SAX.startElementNs(tag3, NULL, NULL, 0, 0, 0)
SAX.characters(AAAA, 4)
SAX.characters(закончилась, 22)
SAX.endElementNs(tag3, NULL, NULL)
SAX.characters(
  , 3)
SAX.endElementNs(tag2, NULL, NULL)
SAX.characters(
, 1)
SAX.endElementNs(tag1, NULL, NULL)
SAX.endDocument()

This does not happen neither when I move the first four characters to the end of the string nor when I move them to the middle.

 

I have searched the maling list for some similar case as well as the xmlsoft website and other resources but honestly I am still puzzled by the behaviour of the parser.

Am I overlooking something?

 

Best regards.
Massimo Comba

Attachment: myfile.xml
Description: Text Data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]