Re: my worry about the recent libxml change



On Mon, 26 Mar 2001, Daniel Veillard wrote:

> On Fri, Mar 23, 2001 at 10:31:03PM +0400, Vlad Harchev wrote:
> > On Fri, 23 Mar 2001, Colm Smyth wrote:
> > 
> > > I don't think the real issue is with codeset detection when parsing
> > > XML files; this can be handled in functions like xmlCreateDocParserCtxt(),
> > > xmlSAXParseDTD(); look for calls to xmlSwitchEncoding().
> > > 
> > > For me, the main questions are
> > > 
> > > 1. how to decide the appropriate "default" codeset of the application
> > > 2. where to call the routines for conversion to that codeset
> > 
> >  As I understand, the question is: should we change libxml1 to be conformant
> > and possibly break all apps that use it, or not..
> 
>   I don't know why I get your Friday mail on Monday, it is kind of obsolete
> by now. Check my more recent mails.

 You should have received my cc to you on friday. As for message to
gnome-hackers@ - gnome-hackers admin approved it too late.. I'm asking for
gnome-hackers@ write-enabled subscription for a long time but didn't get it
yet :(

> >  As for detecting charsets in which to load or save - I think it's better to
> > stick that logic into libxml1. So, apps shouldn't call conversion routines at
> > all (none do now, and provided we are not going to fix every app, we shouldn't
> > add such calls).
> > 
> >  As for loading xml files: I want to stress that it's much better and not
> > error prone at all to test whether the string loaded from file is a valid utf8
> > string, and treat it as utf8 string, and if it's invalid - treat it as if it
> > was in locale's charset. No need to stick to exactly one encoding of xml
> > files - we are lucky that we have an option to guess encoding of the string!
> 
>    At the xml file level, sure. But the new parser will only provide
> UTF8 characters, i.e. the first level of indetermination related to the
> encoding of the XML serialization is removed. Apps only have to deal with
> UTF8 input.
> 
>    The scheme saying "we will use what is the current locale" is IMHO
> broken, you won't be able to keep your configuration files if you change
> your locales. It's also completely incoherent in the sense that it kindof
> works only for the ISO-Latin families, I doubt it will ever work for 
> SHIFT-JIS, EUC-xxx or even worse UTF16 (I assume all the existing APIs
> in Gnome-1.4 would completely crash if i were to deliver 16bits chars
> strings if it were the actual serialization used, and remember, a number
> of Windows application use then when serializing XML !). In a nutshell
> it's a kludge and you people need to change their mindset, assuming
> 'locale' encoding for serialization doesn't work.

 If libxml was saving charset name in xml headers (and later used it when
opening), the approach I proposed won't be a cludge and will be very
consistent (yes, files written under one locale would be correctly readable
under any other locale since charset in which xml data are in would be present
in xml header).

 Also, I think it would be nice to provide functions for getting text from xml
tree in local charset, and for setting strings in local charset to xml tree.
These functions should be provided by libxml of course.. In that case, these
wrappers would isolate all glory details about what charset data in xml tree
is in and anyway would be very convenient for application programmers.

> Daniel
> 

 Best regards,
  -Vlad


_______________________________________________
gnome-hackers mailing list
gnome-hackers gnome org
http://mail.gnome.org/mailman/listinfo/gnome-hackers




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]