Re: [xml] Problem with xmlReadFile on Windows and 0x10 characters



James Ytterstene schrieb am 23.06.2010 um 14:41 (+0200):

If i have the file unchanged from any windows editor the line ending
is CR only but if someone edit the file it will be changed to CRLF
(Stupid windows editors but we must use them) If i now try to read the
file back in libxml2 i will get an extra node at each line only
containing 0x10.

Most serious editors have an option to go with DOS or UNIX or Mac line
endings. Maybe yours do, too.

If i change the xmlReadFile and add the option XML_PARSE_NOBLANKS i
can read the file back ok. But when reading about that option i find
many posts about not to use it, so im confused here.

The question you have to answer: Are whitespace-only text nodes in your
XML significant or not? If they're not significant, nothing wrong with
stripping them. Unless, of course, your output is intended for human
consumption. In that case, you have to keep them, or apply automatic
output indenting.

When i read about libxml2 and how files should be parsed i get the
feeling that the parser should handle the CRLF when reading files and
always save the new files with CR only. So the extra CRLF shouIdn't be
any issue but I can be wrong here.

It's a requirement of the XML spec:

http://www.w3.org/TR/REC-xml/#sec-line-ends

Is there any general solution for the parsing of files so the CR CRLF
doesnt add any extra nodes?

Well yes, the one you already found. Strip whitespace-only text nodes on
parsing, using the appropriate parser or processor option, like in this
case XML_PARSE_NOBLANKS.

-- 
Michael Ludwig



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]