Re: libxml - utf8 / 8bit charsets.



On Mon, Mar 26, 2001 at 05:09:13AM -0500, Michael Meeks wrote:
> 
> Hi Daniel,
> 
>         When I looked into this issue, it seemed to me that libxml was   
> being too clever for it's own good :-) but first - let me assume that the
> only significant user of libxml1 is now the GNOME project - is that fair ?   

  yes, I ask everybody else to switch to libxml2.

>         So - there are a lot of possible char-sets that we could support,   
> however looking at parser.c (xmlSwitchEncoding), it seems that we flag
> errors on all encodings except ENCODING_NONE and ENCODING_UTF8.

  I think it also accept ISO latin 1 too, maybe I forgot to grab that part
  Libxml2 uses iconv for teh support of a far larger set, but it was looking
like backporting the iconv support was not useful for Gnome in it's current
I18N state.

>         So - given that mixed charset xml files exist, why can we not get
> libxml to simply return an exact representation of what was in the input
> string - regardless of encoding. And similarly on write, we just assume
> the application is going to get it correct.

  That's the behaviour of the old library and still available,

>         I think what screwed me up using 8 bit, was code that started
> examining a byte stream as chars assuming that it was utf-8 and trying to
> do validation of it to ensure that no chars in a certain range were
> present. If this is the breakage, it is of very limited use to us.

  First who is us, please explain who's you are talking for ?
  Second the new parser tried to find errors in the input stream, sure
it's a parser ! Whether those errors are encoding related or structural
it doesn't make a difference because you can't find the second properly
without checking the first ones.

Daniel

-- 
Daniel Veillard      | Red Hat Network http://redhat.com/products/network/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]