[xml] Bug in encoding detection with document()

From: Chuck Bearden <cbearden rice edu>
To: xml gnome org
Subject: [xml] Bug in encoding detection with document()
Date: Mon, 23 Mar 2009 15:31:58 -0500

It appears that libxslt1.1 pays attention to the charset declaration in theContent-Type HTTP header when retrieving XML files with MIME types ofapplication/xml or text/xml via the document() function. If a misconfiguredweb server sends "Content-Type: text/xml; charset=iso-8859-15" but the XMLfile itself has no encoding declaration in the XML prolog (and is thus to betaken as UTF-8), libxslt treats the incoming file as ISO-8859-15 and somangles byte sequences that express e.g. many common vowels with diacritics.libxslt does not exhibit the behavior when the MIME type is 'text/html'.Saxon 6.5.5 does not exhibit the same behavior with any MIME type/charsetcombination.

I am attaching a test stylesheet that takes itself as input, and retrieves asimple file in UTF-8 and Latin-9 encodings from a webserver, and outputs theresults with MIME types and charsets noted. I have confirmed the bug inlibxslt 1.1.24--would anyone care to check it in more recent versions before Ilog the bug?


Thanks,
Chuck
--
Chuck Bearden (cbearden rice edu ; 713.348.3661)
XML Engineer, Connexions
http://cnx.org/

Attachment: test.xsl
Description: application/xml

Follow-Ups:
- Re: [xml] Bug in encoding detection with document()
  - From: Chuck Bearden
- Re: [xml] Bug in encoding detection with document()
  - From: Bjoern Hoehrmann

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]