[xml] Line-end Normalisation
- From: "Igor Zlatkovic" <izlatkovic daenet de>
- To: <xml gnome org>
- Subject: [xml] Line-end Normalisation
- Date: Wed, 18 Jul 2001 14:13:46 +0200
Hi there.
I can somehow feel that this theme has lived a thousand lives on this
mailing list. Maybe I am blind or I don't see well, but I haven't found
the answer to the following.
There is an article on the web:
http://www.xml.com/axml/target.html#sec-line-ends and it describes how
line-ends should be handled on input. libxml2 does exactly as stated
there, namely, it converts all two charracter literals #xD#xA and all
solitary #xD literals to #xA.
Now, let's assume I generate the folowing DOM im memory:
<doctag>
This is the text, <-- #xD#xA here
the text this is. <-- #xD#xA here
</doctag>
This <doctag/> element has a text child which contains more than one
line of text. The string in memory has line delimiters represented as
#xD#xA. If I now save this thing to a file, examining the file on the
disk reveals #xD#xD#xA line ends. Now I parse this file back into memory
and, following the input conversion rules, I get line-ends represented
as #xA#xA. Saving it again produces #xD#xA#xD#xA line-ends on the disk.
At this point line ends are duplicated and the phenomenon continues from
the beginning: each #xD#xA pair becomes #xD#xD#xA... and so on.
Now, all this happens on a Win32 machine and that when libxml2 is built
atop of MS C-runtime. We know that text files have CRLF ends on the disk
under Win32 and the fact is that MS C-runtime converts those to LF and
back in functions such as getc, putc, getchar, putchar and friends. I
don't have an UNIX machine handy at the moment to try it out, so I would
appreciate some feedback about if this happens there as well.
Now, the question is: What is to do? Does libxml2 require all strings in
memory to have #xA line-delimiters? If so, that is good because then I
must change my program and that is easy for me. If not, then perhaps we
have a problem between libxml2 and MS C runtime, given the fact they
both convert?
Ciao
Igor
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]