Re: [xml] SAX API returns wrong length for characters containing non-ASCII
- From: Daniel Veillard <veillard redhat com>
- To: Ludwig Weiss <ludwig weiss86 gmail com>
- Cc: xml gnome org
- Subject: Re: [xml] SAX API returns wrong length for characters containing non-ASCII
- Date: Thu, 27 Jun 2013 22:20:58 +0800
On Mon, Jun 24, 2013 at 11:40:41PM +0200, Ludwig Weiss wrote:
Hello,
I'm trying to parse a xml document with the SAX API. The document
containts some some german "umlaute". A short example:
<?xml version="1.0" encoding="UTF-8"?>
<Mediathek>
<X><n>hello from Köln</n><g>http://www.koeln.de</g></X>
<X><n>öhello from Köln</n><g>http://www.koeln.de</g></X>
</Mediathek>
The callback to my charactersSAXFunc tells me the String inside
<n>...</n> of the first line is 12. So the String I save for later use
is just "hello from K".
Whereas for the second line it returns the correct length of 18, so I
get the complete String. The difference is that it starts with an
non-ascii sign. The same happens btw. with french letters.
Possibly I forgot to tell something to the parser?
Thanks for your great effort :)
I think in the first case you should get 2 consecutive character
callbacks, not one, make sure you don't miss events from the parser.
thinkpad:~/XML -> xmllint --sax --debug tst.xml
SAX.setDocumentLocator()
SAX.startDocument()
SAX.startElementNs(Mediathek, NULL, NULL, 0, 0, 0)
SAX.characters(
, 1)
SAX.startElementNs(X, NULL, NULL, 0, 0, 0)
SAX.startElementNs(n, NULL, NULL, 0, 0, 0)
SAX.characters(hello from K, 12)
SAX.characters(öln, 4)
SAX.endElementNs(n, NULL, NULL)
...
SAX is not a good API for developpers, just to easy to get things
wrong, i sugegst to use the Reader API instead !
http://xmlsoft.org/xmlreader.html
Daniel
--
Daniel Veillard | Open Source and Standards, Red Hat
veillard redhat com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | virtualization library http://libvirt.org/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]