Re: [xslt] LibXSLT adding annoying whitespace



On Wed, Jun 13, 2001 at 03:36:37AM +0200, Thomas Broyer wrote:
> And in the HTML spec, B.3.1 Line Breaks
> <http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.1> :
>   «SGML (see [ISO8879], section 7.6.1) specifies that a line break
> immediately following a start tag must be ignored, as must a line break
> immediately before an end tag. This applies to all HTML elements without
> exception.»

This does not include a newline after an end tag, and does not
include space either side of an empty tag (which is neither a start nor
an end tag in SGML).

Were it not so,
<p><b>hello</b> <i>world</i></p>
would be rendered as <p>helloworld</p> without a space, clearly incorrect.

> This means:
> <html><body><img/><br/><a><img/></a></body></html>
> is strictly equivalent to
> <html><body>
> <img>
> <br>
> <a><img></a>
> </body></html>
> for any conforming user agent, so it isn't a bug of libxslt/libxml.

This happens to be true, but not for the reasons you mention...
The specs are complex to understand, and not at all clearly written.  The
actual SGML rules for whitespace involve distinguishing elements that
can contain text (#PCDATA) and ones that cannot, as well as rules about
the start end end of their content.

In the example, <body> isn't allowed to contain text:
    <html><body><p><img/><br/><a><img/></a></p></body></html>
would be a very different document.  Inserting "ignorable" spaces leads to:
    <html>
    <body>
    <p>
    <img/><br/><a>
    <img/>
    </a>
    </p>
    </body>
    </html>

Best,

Lee

-- 
Liam Quin - Barefoot in Toronto - liam@holoweb.net - http://www.holoweb.net/
Ankh: irc.sorcery.net www.valinor.sorcery.net irc.gnome.org www.advogato.org
Author, Open Source XML Database Toolkit, Wiley August 2000
Co-author: The XML Specification Guide, Wiley 1999; Mastering XML, Sybex 2001




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]