Re: my worry about the recent libxml change



On Fri, Mar 23, 2001 at 10:31:03PM +0400, Vlad Harchev wrote:
> On Fri, 23 Mar 2001, Colm Smyth wrote:
> 
> > I don't think the real issue is with codeset detection when parsing
> > XML files; this can be handled in functions like xmlCreateDocParserCtxt(),
> > xmlSAXParseDTD(); look for calls to xmlSwitchEncoding().
> > 
> > For me, the main questions are
> > 
> > 1. how to decide the appropriate "default" codeset of the application
> > 2. where to call the routines for conversion to that codeset
> 
>  As I understand, the question is: should we change libxml1 to be conformant
> and possibly break all apps that use it, or not..

  I don't know why I get your Friday mail on Monday, it is kind of obsolete
by now. Check my more recent mails.

>  As for detecting charsets in which to load or save - I think it's better to
> stick that logic into libxml1. So, apps shouldn't call conversion routines at
> all (none do now, and provided we are not going to fix every app, we shouldn't
> add such calls).
> 
>  As for loading xml files: I want to stress that it's much better and not
> error prone at all to test whether the string loaded from file is a valid utf8
> string, and treat it as utf8 string, and if it's invalid - treat it as if it
> was in locale's charset. No need to stick to exactly one encoding of xml
> files - we are lucky that we have an option to guess encoding of the string!

   At the xml file level, sure. But the new parser will only provide
UTF8 characters, i.e. the first level of indetermination related to the
encoding of the XML serialization is removed. Apps only have to deal with
UTF8 input.

   The scheme saying "we will use what is the current locale" is IMHO
broken, you won't be able to keep your configuration files if you change
your locales. It's also completely incoherent in the sense that it kindof
works only for the ISO-Latin families, I doubt it will ever work for 
SHIFT-JIS, EUC-xxx or even worse UTF16 (I assume all the existing APIs
in Gnome-1.4 would completely crash if i were to deliver 16bits chars
strings if it were the actual serialization used, and remember, a number
of Windows application use then when serializing XML !). In a nutshell
it's a kludge and you people need to change their mindset, assuming
'locale' encoding for serialization doesn't work.

Daniel

-- 
Daniel Veillard      | Red Hat Network http://redhat.com/products/network/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

_______________________________________________
gnome-hackers mailing list
gnome-hackers gnome org
http://mail.gnome.org/mailman/listinfo/gnome-hackers




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]