Re: [xslt] output escaping (again!)
- From: Daniel Veillard <veillard redhat com>
- To: xslt gnome org
- Subject: Re: [xslt] output escaping (again!)
- Date: Sun, 23 Mar 2003 13:27:06 -0500
On Thu, Mar 20, 2003 at 12:43:13AM -0500, Bruce Miller wrote:
> Hi all;
> I've found the discussion from November, but I guess I still don't
> get the justification for the way output escaping is being done.
the URI handling is painful in general.
> As a concrete example, given the xml:
> <foo data="data:text/plain,Hello"/>
>
> and template:
> <xsl:template match="foo">
> <a href="{@data}">Ha</a>
> </xsl:template>
>
> I get:
> <a href="data:text/plain%2CHello">Ha</a>
>
> How can I get:
> <a href="data:text/plain,Hello">Ha</a>
> ?
> Is there some trick I'm missing?
No, it's just the way the libxml2 HTML serializer operates.
> It would seem that disable-output-escaping="yes" doesn't
> apply to attributes.
Rather its purpose is to avoid XML escaping, which is a totally
different purpose and layer than URI escaping.
> I'll grant that it is convenient 95% of the time to have the escaping
> done, but then how do you handle the other 5% ? Besides, it seems to
> me that it violates the specs.
>
> The XSL spec, "section 16.2 HTML Output Method" says
>
> The html output method should escape non-ASCII characters in URI
> attribute values using the method recommended in Section B.2.1 of
> the HTML 4.0 Recommendation.
okay
> Obviously, we're not talking about non-ASCII here. In the previous
we are talking about reserved chars though
> discussion it was noted that "," is a reserved character.
> However, the XSL spec doesn't say that reserved characters should be
> escaped. Indeed, from rfc2396,
>
> 2.4.2. When to Escape and Unescape
> A URI is always in an "escaped" form, since escaping or unescaping a
> completed URI might change its semantics. Normally, the only time
> escape encodings can safely be made is when the URI is being created
> from its component parts; each component may have its own set of
> characters that are reserved, so only the mechanism responsible for
> generating or interpreting that component can determine whether or
> not escaping a character will change its semantics. Likewise, a URI
> must be separated into its components before the escaped characters
> within those components can be safely decoded.
>
> which suggests that processors like xslt should keep their paws off!
>
> Indeed, ":" is a reserved character too! Why didn't it get escaped in
> the above example? Hmm, just for fun, I tried:
Because it's part of the list of chars explicitely provided to not escape
in href at the HTML serialization layer. I added ',' to the list to avoid
your problem.
Anyway the whole URI handling stinks, there is up to 3 layers of
encoding/decoding needed, and in general the rules are not followed
in data or application. Big bad mess... Why do you use ',' and ':'
unescaped in URI while it's clearly stated it's not a good idea ?
Your specific problem should be fixed in the next release. For the
general case I don't think it's doable.
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]