Re: RE : RE : RE : [xml] Parsing invalid characters or entities ref



On Thu, Nov 06, 2003 at 12:36:59PM +0100, GARNIER Pierre wrote:

I did not want to force you to make such a change. 
I just wanted to be enlightened about this question (and maybe, I admit, to
philosophize about a question that is unseemly to philosophize).

  Well to me the philosophical aspect is that silent data corruption
is the worse which can happen in software data processing. From that
view point the rule set by XML-1.0 is very important.
  The most common case of XML corruption is data extraction from
databases, which did not impose encoding or range constraints on 
"strings", this is a sequence of 0s and 1s, but not proper for
text analysis, and XML is about textual analysis. The way to get
this fixed is to become sure of the database content, then check
the encoding and use them when exporting. For characters lower
than bytecode 32 (except spaces/tab/newline) they are forbidden
in XML-1.0 . Those characters will be allowed in XML-1.1 (except
0, that's another story) but only as character references like 
 , assuming that XML-1.1 becomes a W3C recommendation. On the
other hand XML-1.1 will have some otehr constraints about codepoint
over 0x80 and under 0xFF , where character references will be required
too, again for the same reason, making sure that characters which
are actually binary data are not passed transparently as ISO 8859
data.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]