Re: [xml] Problems with file names in UTF-8 on Windows

On Wed, Aug 09, 2006 at 10:25:04AM +0400, Emelyanov Alexey wrote:

First of all would like to thank for libxml. Useful and convenient thing 
has turned out.

Now on business.

First, realization in version 2.6.24 of file name processing in the 
UTF-8 encoding for Windows has led to the following

1. Updating library to new version results to incapacity for work of 
programs, which use file names in
   native encoding; now all such programs are compelled to transform 
file names to UTF-8
2. The library became incompatible with Windows 95/98/ME, as functions 
   and _wstat use features not realized by default in these versions of 
OS (bug #346367).

It seems reasonable to process file names in native encoding by default, 
and establish
transformation mode from UTF-8 obviously.

  I don't think it's obvious. Roland Schwingel who provided that patch 
argued differently. I don't use Windows, I have no way to test or check,
I have to rely on the expertise of people on the mailing-list in that area.

In attachment there is a corrected variant of xmlIO.c. A name 
transformation mode
is established by function xmlSetFileNameMode.

  I'm sorry, send contextual patches, not new files, even worse a bunch of
files. You must send a patch, which shows up exactly what you modified.
Also you should send a clear explanation of the modifications, why you changed
things. "a corrected variant" is not acceptable for review, sorry.
  Moreover I expect all those changes/diff to be guarded by #ifdef WIN32
or something similar at the code level, because obviously this should not
affect non Windows code in any way.
  Last but not least xmlSetFileNameMode() is not acceptable, this means 
having to introduce a global variable in the library, and I'm trying to
get rid of them. If you want different mode of operation for older Windows
version find a way to detect that version at compile time or runtime, but
adding a new API which makes no sense on other platforms introducing a
global variable is definitely not okay.
However using of names in UTF-8 in the offered realization is possible 
only in
Windows NT/2000/XP/... For Windows 9x it should to add reverse
transformation from Unicode to native encoding.

  I do not understand clearly what you mean here, is taht what you suggest
to do, what your changes should do or something else ?

Second, it would be quite good to add in library group of simple exported
functions for read access to fields of structures.
It will simplify API description in other languages and will allow not to
recompile programs after possible changes of library structures.

The example of realization of similar functions is in the same archive 
(files wrappers.*).
At reading of string fields copying is not carried out to reduce call 

  Okay, that's not acceptable. Adding a new header involves a lot of work
not just merely adding a file to the subdir. I think it's frivolous to
add one for teh reason exposed. Moreover I disagree with adding accesors
on technical ground:
   - libxml2 exports a lot of existing structures, containing a lot of fields
   - if we start adding accessors, this means a lot of new function
   - this won't help for API since existing uses those structures
   - adding new functions is costly *at runtime*

 to be clear about the last point libxml2 already has more than 1500 exported
entry point. For position independant code in shared libraries there is a
runtime cost of relocating all exported symbols, if we start adding accessors
that's so many more work to be done, so I'm against it unless it's for new
functionalities and it's clear that the number of entry point is low.

 So overall, I'm sorry I cannot work on your code submission, it's really
too far off from the normal review process, not in line with libxml2 development
rules. I suggest you revisit the issue based on my feedback,


Daniel Veillard      | Red Hat
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]