Re: [xslt] output escaping (again!)



On Thu, Mar 20, 2003 at 12:43:13AM -0500, Bruce Miller wrote:
> Hi all;
>   I've found the discussion from November, but I guess I still don't
> get the justification for the way output escaping is being done.

  the URI handling is painful in general.

> As a concrete example, given the xml:
>  <foo data="data:text/plain,Hello"/> 
> 
> and template:
>  <xsl:template match="foo">
>     <a href="{@data}">Ha</a>
>  </xsl:template>
> 
> I get:
>  <a href="data:text/plain%2CHello">Ha</a>
> 
> How can I get:
>   <a href="data:text/plain,Hello">Ha</a>
> ?  
> Is there some trick I'm missing?

  No, it's just the way the libxml2 HTML serializer operates.

> It would seem that disable-output-escaping="yes" doesn't
> apply to attributes.

  Rather its purpose is to avoid XML escaping, which is a totally
different purpose and layer than URI escaping.

> I'll grant that it is convenient 95% of the time to have the escaping 
> done, but then how do you handle the other 5% ?  Besides, it seems to 
> me that it violates the specs.
> 
> The XSL spec, "section 16.2 HTML Output Method" says
> 
>    The html output method should escape non-ASCII characters in URI 
>    attribute values using the method recommended in Section B.2.1 of 
>    the HTML 4.0 Recommendation.

  okay

> Obviously, we're not talking about non-ASCII here.  In the previous

  we are talking about reserved chars though

> discussion it was noted that "," is a reserved character.
> However, the XSL spec doesn't say that reserved characters should be
> escaped.  Indeed, from rfc2396,
> 
>    2.4.2. When to Escape and Unescape
>    A URI is always in an "escaped" form, since escaping or unescaping a
>    completed URI might change its semantics.  Normally, the only time
>    escape encodings can safely be made is when the URI is being created
>    from its component parts; each component may have its own set of
>    characters that are reserved, so only the mechanism responsible for
>    generating or interpreting that component can determine whether or
>    not escaping a character will change its semantics. Likewise, a URI
>    must be separated into its components before the escaped characters
>    within those components can be safely decoded.
> 
> which suggests that processors like xslt should keep their paws off!
> 
> Indeed, ":" is a reserved character too! Why didn't it get escaped in
> the above example? Hmm, just for fun, I tried:

  Because it's part of the list of chars explicitely provided to not escape
in href at the HTML serialization layer. I added ',' to the list to avoid
your problem.
  Anyway the whole URI handling stinks, there is up to 3 layers of 
encoding/decoding needed, and in general the rules are not followed
in data or application. Big bad mess... Why do you use ',' and ':' 
unescaped in URI while it's clearly stated it's not a good idea ?

  Your specific problem should be fixed in the next release. For the
general case I don't think it's doable.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard@redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]