Re: my worry about the recent libxml change



On Thu, 22 Mar 2001, Darin Adler wrote:

> The current code in xml-i18n-tool, OAF, and Nautilus depends on the
> following property: Text in XML files, localized strings that come from
> gettext, file names, GTK widget labels, and other strings all use the same
> character set (the local one for each locale, often Latin-1).
> 
> Adding code to libxml to properly handle character sets when reading and
> writing XML will retroactively decide that existing files are in some
> particular character set, when they are actually in a mix of character sets.
> 
> Making libxml DOM trees in memory always use UTF-8 will break all the code
> that puts strings in and takes them out without doing any translation.
> 
> I don't see how to make a program that works compatibly with both the old
> and new versions of libxml. I have no idea how to address this issue in the
> code for the various packages.
> 
> I hope someone can prove with testing or coding or both that I am wrong, and
> this change can be done compatibly.

 Here is what I propose:
* DOM tree should use locale's encoding in memory. When we switch to gtk-2.0, 
  this will automatically mean that tree will be in utf8 format.
* When saving xml files, locale's charset name should be saved in xml header.

 When loading files:
1) If xml file has charset name in header, convert from that charset to
  locale's charset
2) If xml file doesn't have charset name in header, try to interpret each
string as if it was in utf8  - if the string is malformed utf8 string, treat
as if string was in locale's charset, if the string is really well-formed utf8
string, convert it from utf8 to locale's charset.

 This way, it seems we won't have any problems (except we loose ability to
decide whether xml file is broken or not if it contains strings that are
malformed utf8 sequences) - but it seems no modifications to existing "broken"
software are needed.
 Of course we can change behaviour of libxml depening on some global variable
- non-broken software may initialize these variables (perhaps by calling
another initialization function for libxml) - so it seems there is a way to
solve all problems at once without the need to fix all broken programs..


>     -- Darin
> 
> 

 Best regards,
  -Vlad





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]