Re: [xslt] UTF-8 escaping



On Mon, Aug 19, 2002 at 10:28:11AM -0400, Daniel Veillard wrote:
>   1/ there is already an heavilly tested function for URI-Escaping in libxml2
>      in the uri.c module, and it does it properly after having parsed the
>      provided URI-Reference to do the escaping where possible (reread
>      RFC 2396 the escaping algorithm you used can generate erroneous
>      URI-Refereces I'm afraid by escaping blindly independantly of the
>      position of the character in the string)
>        see xmlURIEscapeStr() in uri.c

Cool. I did not see this escaping algorithm when I went looking.

What is the method to call this function from xpath? I don't see a
.*Register.* in the file. Also:

terpstra@maul:~/apt/libxml2-2.4.23$ grep URI *.c | grep Register
-> nada

re: erroneous escaping:

This is not the point of escape-uri as the w3c recommendation considers it.

It is not intended to _guess_ what the URI it is escaping is (relative,
absolute, fragment, etc)---the caller tells it what he wants escaped, and
then gets back the string. There is no magic detection by design.

The most useful use of this is not in escaping the entire uri, but in
escaping components like the variables to a CGI GET.

For my purposes, I need it to get the utf-8 hex string for email headers.

Also, this is not my algorithm; it is a direct transcription from the w3c
description which I copied from the url at the bottom.

As far as I can see, you are doing exactly the same thing I am doing... You
have an IS_UNRESERVED instead of the if, and check an exclusion list, and
allow raw @s, but aside from that---all is the same (when it finally comes
down to escaping).

Again, what I am looking for is not a post-processing function on uris, but
a function which allows me to get the hex-stream of the utf-8 from within
xslt (please reread my last email). It just so happens that the only
function I have seen which allows me to get the result is the escape-uri as
seen on w3c xfunctions.

If you find the "if" distasteful, then using IS_UNRESERVED or IS_RESERVED
accoring to $escape-reserved is fine by me.

> > +     xmlXPathRegisterFunc(ctxt, (const xmlChar *)"escape-uri",
> > +                          xmlXPathEscapeUriFunction);
> 
>   2/ registering this extension function directly in the XPath library
>      without an URI associated to the extension is plain wrong and would
>      make libxml2 XPath implementation non-conformant. The extension fonction
>      would have to be registered for example as an EXSLT funtion or under
>      another specific namespace, but definitely not without prefix.

The function I implemented *will probably be* part of XPath.
It is listed for xpath 2.0:
	http://www.w3.org/TR/xquery-operators/#func-escape-uri

I just thought it would be nicer to use the new function in XPath 2 than to
create yet another engine-specific extension.

PS. Is there any existing way in libxslt to do:
	str->utf-8->hex
and get the answer back inside xsl?

-- 
Wesley W. Terpstra <wesley@terpstra.ca>

PGP signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]