[xml] Bug in encoding detection with document()
- From: Chuck Bearden <cbearden rice edu>
- To: xml gnome org
- Subject: [xml] Bug in encoding detection with document()
- Date: Mon, 23 Mar 2009 15:31:58 -0500
It appears that libxslt1.1 pays attention to the charset declaration in the
Content-Type HTTP header when retrieving XML files with MIME types of
application/xml or text/xml via the document() function. If a misconfigured
web server sends "Content-Type: text/xml; charset=iso-8859-15" but the XML
file itself has no encoding declaration in the XML prolog (and is thus to be
taken as UTF-8), libxslt treats the incoming file as ISO-8859-15 and so
mangles byte sequences that express e.g. many common vowels with diacritics.
libxslt does not exhibit the behavior when the MIME type is 'text/html'.
Saxon 6.5.5 does not exhibit the same behavior with any MIME type/charset
combination.
I am attaching a test stylesheet that takes itself as input, and retrieves a
simple file in UTF-8 and Latin-9 encodings from a webserver, and outputs the
results with MIME types and charsets noted. I have confirmed the bug in
libxslt 1.1.24--would anyone care to check it in more recent versions before I
log the bug?
Thanks,
Chuck
--
Chuck Bearden (cbearden rice edu ; 713.348.3661)
XML Engineer, Connexions
http://cnx.org/
Attachment:
test.xsl
Description: application/xml
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]