Re: [xslt] output escaping (again!)



Daniel Veillard wrote:
> On Thu, Mar 20, 2003 at 12:43:13AM -0500, Bruce Miller wrote:
>
>>Indeed, ":" is a reserved character too! Why didn't it get escaped in
>>the above example? Hmm, just for fun, I tried:
> 
>   Because it's part of the list of chars explicitely provided to not escape
> in href at the HTML serialization layer. I added ',' to the list to avoid
> your problem.

Thanks! But, I still think we'd all be better off if it didn't even try :>

>   Anyway the whole URI handling stinks, there is up to 3 layers of 
> encoding/decoding needed, and in general the rules are not followed
> in data or application. Big bad mess... Why do you use ',' and ':' 
> unescaped in URI while it's clearly stated it's not a good idea ?

But no! It's not that it's a bad idea to use them in general --- in fact,
it's mandatory! But it _is_ a bad idea to use them in a way that
doesn't conform to the url scheme in question.
>From RFC2396:
   The "reserved" syntax class above refers to those characters that are
   allowed within a URI, but which may not be allowed within a
   particular component of the generic URI syntax; they are used as
   delimiters of the components described in Section 3.


Now, the data scheme may turn out to be a moot example since it doesn't
seem to be as well supported as I'd like, but it's a simple one, so...
(see ftp://ftp.isi.edu/in-notes/rfc2397.txt)

Consider the following URI:
  (1)  data:text/plain,Hello%2C%20World
This represents an `in-line' plain text document.
If you click such an html link, like
  <a href="data:text/plain,Hello%2C%20World">Hello, World</a>
you should get a document, as plain text, containing only 
the string "Hello, World". (Presumably it could also be used
for inline images, etc)

The following two similar URI's are _different_ (and maybe invalid):
  (2)  data:text/plain%2CHello%2C%20World   (No <data> part)
  (3)  data:text/plain,Hello, World     (space & 2nd comma not allowed)

So, it would appear that by preventing me from producing (3),
libxml is forcing me to get (2)  ---- and I can't get (1)!!!

>   Your specific problem should be fixed in the next release. For the
> general case I don't think it's doable.

I agree!! Since libxml can't read my mind, I'm responsible for telling
it what I want.  If libxml 
  (a) _only_ converted non-ASCII and
  (b) supported str:encode-uri, (or fn:escape-uri from XPath2.0) 
it would be simple (but verbose!):

  <a href="{concat('data:text/plain,', str:encode-uri(@data),true())}"> ...
Then <foo data="Hello, World"/> would give me case (1).

Alas, str:encode-uri isn't implemented in libxml, nor anywhere else, according
to http://www.exslt.org/.

I'm guessing that what happened started as an oversight in XPath 1.0.  When people
noticed that something was missing, some people proposed an extension, while the
library writers assumed that it was supposed to be handled in the html serialization.
Since the latter got done, there was no apparent need to implement the former.
But now we're wedged with something that _can't_ work.
> 
> Daniel
> 





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]