Re: [xml] Bug in encoding detection with document()



* Chuck Bearden wrote:
It appears that libxslt1.1 pays attention to the charset declaration in the 
Content-Type HTTP header when retrieving XML files with MIME types of 
application/xml or text/xml via the document() function.  If a misconfigured 
web server sends "Content-Type: text/xml; charset=iso-8859-15" but the XML 
file itself has no encoding declaration in the XML prolog (and is thus to be 
taken as UTF-8), libxslt treats the incoming file as ISO-8859-15 and so 
mangles byte sequences that express e.g. many common vowels with diacritics. 

The charset parameter takes precedence over internal labels and defaults
so it is the misconfigured server that mangles those sequences. See e.g.
RFC 3023 for a discussion.
-- 
Björn Höhrmann · mailto:bjoern hoehrmann de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]