RE: [xml] final output filtering?



Right now libxml writes \r\n files to the disc on Windows. 
Libxml serialises

  err, really ? I was suspecting the gzWrite layer got 
involved instead
and that one actually does "wb"

The binaries I make do not use zlib, so no gzWrite here.
 
\n, and the runtime adds a \r. Doing a xmlParseFile and 
then xmlSaveFile
effectively converts \n files to \r\n ones. Using "wb" 
instead of "w" would
reverse the situation, namely a \r\n file would end up 
converted to \n after
you use the function pair above. I believe quite a few 
would wish me harm
after such a change.

  Hum, can you confirm it's really the existing behaviour ?

Yes, it is. Every \n gets prepended by \r. 
 
  you mean fopen(..., "w", ...) is locale dependant on Windows ? 
/me screams !!!
  Or I misunderstood something ... quite possible.

No, that was just a thought, for I have no idea what all the 300 encodings
on this world see as a valid character. If there are no non-unicode
encodings where the byte 0x0a can be a part of a multibyte character, then
it shouldn't be locale dependant. If such encodings exist, then I hope it is
locale dependant, otherwise every "w" instead of "wb" breaks the output
stream.

Anyway by default XML srialization should not be locale dependant,
should save \n\r in a text node as 

 and not turn \n in
a text node to anything else on output.

If you save it as a reference, then it remains that way, 
 makes 5
bytes. If you however convert this to one 0x0a byte in the output stream,
you'll get another 0x0d prepended. Didn't test this, but I believe this is
so.

  I can't believe all Windows installations are broken by default
and nobody pointed this out before !!!

Actually they are not that broken. The only effect so far is that the
serialised file has \r\n line ends, which most users were happy about when
they opened the file with the notepad editor.

The only case I had with 
 in a text node was an entity reference
&#nl;, which per DTD resolved to 
, which in turn are character
references. The only purpose for these was pretty-formatting the output
files and most of it I had to do on Unix, so I noticed nothing strange :-).
Later using the same on Windows, you simply don't notice such things when
you edit XML files with libxml API, not with a text editor, and completely
ignore all nodes for which xmlIsBlankNode returns 1 :-)

Ciao,
Igor



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]