Re: [xml] Problem with xmlReadFile on Windows and 0x10 characters

Thanks for all the replies. I fell quite safe now that i know how it works.
All our important data is held within <xxx>...</xxx> so the extra linebreaks
and spaces for better visualisation will not impact.
Only one follow question. The XML_PARSE_NOBLANKS fixed everyting
and i could mix CR and CRLF in the file. But when i read about the option it say
it fixes whitespace and not CRLF. Is that part missing in the documentation
or am i only reading it bad. Is 0x10 handled as a blank, or is there any more
characters i might miss?

2010/6/23 Michael Ludwig <milu71 gmx de>
James Ytterstene schrieb am 23.06.2010 um 14:41 (+0200):

> If i have the file unchanged from any windows editor the line ending
> is CR only but if someone edit the file it will be changed to CRLF
> (Stupid windows editors but we must use them) If i now try to read the
> file back in libxml2 i will get an extra node at each line only
> containing 0x10.

Most serious editors have an option to go with DOS or UNIX or Mac line
endings. Maybe yours do, too.

> If i change the xmlReadFile and add the option XML_PARSE_NOBLANKS i
> can read the file back ok. But when reading about that option i find
> many posts about not to use it, so im confused here.

The question you have to answer: Are whitespace-only text nodes in your
XML significant or not? If they're not significant, nothing wrong with
stripping them. Unless, of course, your output is intended for human
consumption. In that case, you have to keep them, or apply automatic
output indenting.

> When i read about libxml2 and how files should be parsed i get the
> feeling that the parser should handle the CRLF when reading files and
> always save the new files with CR only. So the extra CRLF shouIdn't be
> any issue but I can be wrong here.

It's a requirement of the XML spec:

> Is there any general solution for the parsing of files so the CR CRLF
> doesnt add any extra nodes?

Well yes, the one you already found. Strip whitespace-only text nodes on
parsing, using the appropriate parser or processor option, like in this

Michael Ludwig
xml mailing list, project page
xml gnome org

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]