Re: [xml] How do I read German Umlaute - entities from an XML-File using libxml?

On Fri, Oct 05, 2001 at 08:42:29AM -0700, Bill Moseley wrote:
At 11:11 AM 10/05/01 -0400, Daniel Veillard wrote:
 The libxml callback gives UTF8 encoded strings. It seems taht you expect
ISO-8859-1 encoded ones. You need to convert between both. There is a routine
called isolat1ToUTF8, use it or change you program to use UTF8 encoding.

BTW- Is there a way to get UTF8Toisolat1 to replace *non* 8859-1 chars with
something else (e.g. a space)?  

  No, hardcoding a given behaviour in case of error makes no sense.

Say I have a long UTF-8 string with a number of non Latin-1 chars that *do*
convert to Latin-1, but one character that doesn't.  UTF8Toisolat1 returns
an error and I'm forced to use the UTF-8 string which means I lost the
characters that would have been converted (and worse, using them as if they
were Latin-1). 

  No, UTF8Toisolat1 will convert everything it can. It may stop:
     - and return -2 if there is a conversion error, inlen and outlen
       return values allows you to process the way you like the next 
       UTF8 char and continue.
     - if the out buffer is full.
 This is explained in the documentation:

 That convention is the same as all iconv() filters.

Most of the time the characters that don't convert are an entity for some
symbol that I wouldn't care about anyway.

Does that make any sense? 

   For you, maybe

Would that be helpful to anyone else?

   For some maybe, and would make it useless for everybody else.


Daniel Veillard      | Red Hat Network
veillard redhat com  | libxml Gnome XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]