[xml] Problem with UTF8Toisolat1()



Hello.

I started writing a program to suck the wiki text out of MediaWiki XML
dumps, and convert it to ISO-8859-1 (in order to run it through
aspell). I'm having some unexpected failures in calls to
UTF8Toisolat1() though. I've narrowed it down to this example program.
UTF8Toisolat1() returns -2 (transcoding error) in this program, when
used with the instance at:

http://sv.wikiquote.org/wiki/Special:Export/MediaWiki:Doubleredirectsarrow

Though I can't see why. Can anyone see my (probably) simple mistake here?

I've tried running this program with the full dump of the Swedish
edition of the Wikiquote website, available compressed at:

http://download.wikimedia.org/wikiquote/sv/pages_current.xml.bz2

With that document, I get 1722 successful UTF8Toisolat1() calls and 65
calls that return -2.

Thanks in advance.

Best regards,
Aron Stansvik

Attachment: convtest.c
Description: Binary data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]