[xslt] XHTML serialization - implicit creation of CDATA sections for specific elements



Hi,

This issue concerns the serialization of the content of
"script" and "style" elements to XHTML.
Currently the content of those elements is implicitely
put in a CDATA section during the serialization.
I'll advocate for the disabling of this implicit serialization
mechanism.

Related reports:
http://bugzilla.gnome.org/show_bug.cgi?id=302529
http://bugzilla.gnome.org/show_bug.cgi?id=345147

I see the following reasons to disable this mechanism:

1) The addition is not performed by the Saxon, Xalan, Sablotron and .NET
  XSLT processors (maybe others as well).

2) It constitutes a problem for some people, since comes unexpectedly.

3) The XHTML spec does not give a hint that CDATA sections
  should be generated automatically by the serialization
  mechanism; it only makes XHTML authors aware that one can use CDATA
  sections to make the text more readable in the XML document.
  (http://www.w3.org/TR/xhtml1/#h-4.8)

Normally a wrapping with a CDATA section should be only a
textual issue, since, if disable-output-escaping is used, this can
result
in ugly masses of in-between CDATA sections . This is
demonstrated in the XSLT 2.0 spec.

Taken from http://www.w3.org/TR/xslt20/#d5e29391:
-----
Example: Interaction of Output Escaping and CDATA
For example, if <xsl:output cdata-section-elements="title"/> is
specified,
then the following instructions:

<title>
  <xsl:text disable-output-escaping="yes">This is not &lt;hr/&gt; good
coding practice</xsl:text>
</title>

should generate the output:

<title><![CDATA[This is not ]]><hr/><![CDATA[ good coding
practice]]></title>
-----

Due to a bug in the serialization of non-output-escaped text in
conjunction with CDATA sections, the current state can
lead to incorrect results.
The following example demonstrates that the generated content
of the "style" element differs; the current behaviour generates
the text "<hr/>", while the correct result would be the element
"hr".

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
  xmlns="http://www.w3.org/1999/xhtml";>

  <xsl:output method="xml"    
    doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
    doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";
    indent="yes"/>

  <xsl:template match="/">
    <html>
      <head>      
        <style type="text/css">
          <xsl:text disable-output-escaping="yes">This is not
&lt;hr/&gt; good coding practice</xsl:text>
        </style>
      </head>      
    </html>
  </xsl:template>
</xsl:stylesheet>

Current relevant result
-----------------------
The serializer misses to analyse the XML-specific semantic of the text
to be
output, and puts the <hr/> inside the CDATA section:
...
<style type="text/css"><![CDATA[This is not <hr/> good coding
practice]]></style>
...

Expected relevant results
-------------------------
If we keep the generation of CDATA sections, then we should get:
...
<style type="text/css"><![CDATA[This is not ]]><hr/><![CDATA[ good
coding practice]]></style>
...

If we remove the generation of CDATA sections:
...
<style type="text/css">This is not <hr/> good coding practice</style>
...

So actually there are 2 issues here:
1) Removing the unexpected generation of CDATA sections
2) Fixing the serialization bug

Regards,

Kasimier







[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]