On Mon, Jun 24, 2013 at 11:40:41PM +0200, Ludwig Weiss wrote:

I'm trying to parse a xml document with the SAX API. The document
containts some some german "umlaute". A short example:
<?xml version="1.0" encoding="UTF-8"?>
<X><n>hello from Köln</n><g></g></X>
<X><n>öhello from Köln</n><g></g></X>

The callback to my charactersSAXFunc tells me the String inside
<n>...</n> of the first line is 12. So the String I save for later use
is just "hello from K".

Whereas for the second line it returns the correct length of 18, so I
get the complete String. The difference is that it starts with an
non-ascii sign. The same happens btw. with french letters.

Possibly I forgot to tell something to the parser?

Thanks for your great effort :)

  I think in the first case you should get 2 consecutive character
callbacks, not one, make sure you don't miss events from the parser.

thinkpad:~/XML -> xmllint --sax --debug tst.xml
SAX.startElementNs(Mediathek, NULL, NULL, 0, 0, 0)
, 1)
SAX.startElementNs(X, NULL, NULL, 0, 0, 0)
SAX.startElementNs(n, NULL, NULL, 0, 0, 0)
SAX.characters(hello from K, 12)
SAX.characters(öln, 4)
SAX.endElementNs(n, NULL, NULL)


  SAX is not a good API for developpers, just to easy to get things
wrong, i sugegst to use the Reader API instead !


