Re: [xml] Problem with xmlReadFile on Windows and 0x10 characters

From: James Ytterstene <james mensa se>
To: xml gnome org
Subject: Re: [xml] Problem with xmlReadFile on Windows and 0x10 characters
Date: Wed, 23 Jun 2010 22:02:21 +0200

Hello

Thanks for all the replies. I fell quite safe now that i know how it works.

All our important data is held within <xxx>...</xxx> so the extra linebreaks

and spaces for better visualisation will not impact.

Only one follow question. The XML_PARSE_NOBLANKS fixed everyting

and i could mix CR and CRLF in the file. But when i read about the option it say

it fixes whitespace and not CRLF. Is that part missing in the documentation

or am i only reading it bad. Is 0x10 handled as a blank, or is there any more

characters i might miss?

/James

2010/6/23 Michael Ludwig <milu71 gmx de>

James Ytterstene schrieb am 23.06.2010 um 14:41 (+0200):

> If i have the file unchanged from any windows editor the line ending
> is CR only but if someone edit the file it will be changed to CRLF
> (Stupid windows editors but we must use them) If i now try to read the
> file back in libxml2 i will get an extra node at each line only
> containing 0x10.

Most serious editors have an option to go with DOS or UNIX or Mac line
endings. Maybe yours do, too.

> If i change the xmlReadFile and add the option XML_PARSE_NOBLANKS i
> can read the file back ok. But when reading about that option i find
> many posts about not to use it, so im confused here.

The question you have to answer: Are whitespace-only text nodes in your
XML significant or not? If they're not significant, nothing wrong with
stripping them. Unless, of course, your output is intended for human
consumption. In that case, you have to keep them, or apply automatic
output indenting.

> When i read about libxml2 and how files should be parsed i get the
> feeling that the parser should handle the CRLF when reading files and
> always save the new files with CR only. So the extra CRLF shouIdn't be
> any issue but I can be wrong here.

It's a requirement of the XML spec:

http://www.w3.org/TR/REC-xml/#sec-line-ends

> Is there any general solution for the parsing of files so the CR CRLF
> doesnt add any extra nodes?

Well yes, the one you already found. Strip whitespace-only text nodes on
parsing, using the appropriate parser or processor option, like in this
case XML_PARSE_NOBLANKS.

--
Michael Ludwig

_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml

Follow-Ups:
- Re: [xml] Problem with xmlReadFile on Windows and 0x10 characters
  - From: Michael Ludwig

References:
- [xml] Problem with xmlReadFile on Windows and 0x10 characters
  - From: James Ytterstene
- Re: [xml] Problem with xmlReadFile on Windows and 0x10 characters
  - From: Michael Ludwig

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]