Re: [xml] How do I read German Umlaute - entities from an XML-File using libxml?
- From: Daniel Veillard <veillard redhat com>
- To: Bill Moseley <moseley hank org>
- Cc: xml gnome org
- Subject: Re: [xml] How do I read German Umlaute - entities from an XML-File using libxml?
- Date: Fri, 5 Oct 2001 13:11:31 -0400
On Fri, Oct 05, 2001 at 08:42:29AM -0700, Bill Moseley wrote:
At 11:11 AM 10/05/01 -0400, Daniel Veillard wrote:
The libxml callback gives UTF8 encoded strings. It seems taht you expect
ISO-8859-1 encoded ones. You need to convert between both. There is a routine
called isolat1ToUTF8, use it or change you program to use UTF8 encoding.
BTW- Is there a way to get UTF8Toisolat1 to replace *non* 8859-1 chars with
something else (e.g. a space)?
No, hardcoding a given behaviour in case of error makes no sense.
Say I have a long UTF-8 string with a number of non Latin-1 chars that *do*
convert to Latin-1, but one character that doesn't. UTF8Toisolat1 returns
an error and I'm forced to use the UTF-8 string which means I lost the
characters that would have been converted (and worse, using them as if they
were Latin-1).
No, UTF8Toisolat1 will convert everything it can. It may stop:
- and return -2 if there is a conversion error, inlen and outlen
return values allows you to process the way you like the next
UTF8 char and continue.
- if the out buffer is full.
This is explained in the documentation:
http://xmlsoft.org/html/libxml-encoding.html#UTF8TOISOLAT1
That convention is the same as all iconv() filters.
Most of the time the characters that don't convert are an entity for some
symbol that I wouldn't care about anyway.
Does that make any sense?
For you, maybe
Would that be helpful to anyone else?
For some maybe, and would make it useless for everybody else.
Daniel
--
Daniel Veillard | Red Hat Network http://redhat.com/products/network/
veillard redhat com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]