[xml] Characters callback



Hello,


I m using a third party library to parse response from Amazon AWS SDB
with libxml2. I have a bug with this third party library and since I
ve all the source code I tried to find the problem by myself but I
didn t write all this code so let me try to explain my problem from a
high level.

I would like to store characters with "accent" (french for example) in
the Amazon database, to do that I m sending UTF-8 string. That s fine,
the string is correctly stored in the DB and I can retrieve it with
any tools. However when I try to retrieve it with the library the
returned string is split in the libxml2 characters callback. The split
is exactly on the first character with the accent and two calls to the
characters callback are made. And the 3rd party library doesn t handle
this correctly, believing it s two different values.
If there is no UTF-8 characters (or special characters) everything is
fine and the callback is called only once for all the values.

I can "patch" the library now I know what is the problem by setting a
boolean to just let the parsing knows it s the same value but before
doing so I just would like to know if it s normal and if there is a
better approach than what I m planning to do.

So for example, the characters callback is:
aCallBack->theSAXHandler.characters     = &QueryCallBack::SAX_CharactersSAXFunc;

void QueryCallBack::SAX_CharactersSAXFunc ( void * ctx,
      const xmlChar * value,
      int len ) {
    QueryCallBack* lCallBack  = static_cast<QueryCallBack*> ( ctx );
    lCallBack->charactersSAXFunc( value, len );
  }

and the text to parse is:
<?xml version="1.0"?>
<GetAttributesResponse xmlns="http://sdb.amazonaws.com/doc/2007-11-07/";>
<GetAttributesResult>
<Attribute>
<Name>localFilename</Name>
<Value>C:/Users/login/Pictures/Aquarium/tést.r</Value>
...

Then the characters callback will be called for:
C:/Users/login/Pictures/Aquarium/t

and then

ést.r

instead of only one call with the whole string:
C:/Users/login/Pictures/Aquarium/tést.r

Then, is it normal ? Is it a flag to set somewhere ?
Any help would be greatly appreciated, before going to the "dirty" way :)

Thanks a lot,

Dlp



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]