[xml] characters callback called twice (and UTF-8?)

From: bagnacauda <bagnacauda gmail com>
To: xml gnome org
Subject: [xml] characters callback called twice (and UTF-8?)
Date: Fri, 16 May 2008 12:37:12 +0200

Hello,

I need your help to understand what follows.

I have this xml file (you can find it attached) whose tag may contain western European, Russian or Greek characters, even mixed among them.

I have run xmllint --debug ?sax on the file to see if everything is OK when I get a mixed character string and I was surprised to see that the characters callback is invoked twice: once for the first four characters (which are western european) and once for the remaining part of the string (Russian).

Output of xmllint is as follows:

SAX.setDocumentLocator()
SAX.startDocument()
SAX.startElementNs(tag1, NULL, NULL, 2, xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance', xmlns:xsd='http://www.w3.org/2001/XMLSchema', 5, 0, xsi:noNamespaceSchemaLocation='myxs...', 9, Version='1.2"...', 3, CreationDate='2007...', 10, CreationTime='17:0...', 8, CreationTimeOffset='+01"...', 3)
SAX.characters(
, 3)
SAX.startElementNs(tag2, NULL, NULL, 0, 0, 0)
SAX.characters(
, 5)
SAX.startElementNs(tag3, NULL, NULL, 0, 0, 0)
SAX.characters(AAAA, 4)
SAX.characters(закончилась, 22)
SAX.endElementNs(tag3, NULL, NULL)
SAX.characters(
, 3)
SAX.endElementNs(tag2, NULL, NULL)
SAX.characters(
, 1)
SAX.endElementNs(tag1, NULL, NULL)
SAX.endDocument()

This does not happen neither when I move the first four characters to the end of the string nor when I move them to the middle.

I have searched the maling list for some similar case as well as the xmlsoft website and other resources but honestly I am still puzzled by the behaviour of the parser.

Am I overlooking something?

Best regards.

Massimo Comba

Attachment: myfile.xml
Description: Text Data

Follow-Ups:
- Re: [xml] characters callback called twice (and UTF-8?)
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]