Re: [xslt] UTF-8 escaping



On Mon, Aug 19, 2002 at 04:09:40PM +0200, Wesley W. Terpstra wrote:
> Hello all!
> 
> Earlier this week I ran into a fairly simple problem: escaping utf-8. 
> It seems there is no way to do it (properly) in libxslt!
> 
> Sure, one can handle all of ASCII using string-length(substring-before(...))
> tricks, but what about the rest of unicode?
> 
> According to the w3c, non-ascii (and only non-ascii; "=?&" etc are left
> untouched) unicode characters in uri attributes should be tranformed into
> utf-8 and then %hexified by default when the output mode is html. libxslt1
> may or may not do this, but it will not help me since it is a
> post-processing step.
> 
> Apparently, web browsers are supposed to do this as well, so that even if
> the xslt engine does not, you are in the green.
> 
> However, not everyone is outputing html. :-)
> For RFC822 mail headers one can use  "=?utf-8?Q?M=C3=BCller?="
> to encode unicode characters in subjects, from lines, etc.
> 
> If a uri-escape function such as the one proposed at
> 	http://www.w3.org/TR/xquery-operators/#func-escape-uri
> existed, then one could use it and translate() to acheive formatting like
> above.
> 
> eg: 
> 	<xsl:text>=?utf-8?Q?</xsl:text>
> 	<xsl:value-of select="translate(escape-uri($str, true()), '%', '=')"/>
> 	<xsl:text>?=</xsl:text>
> 
> This is also useful/important for "mailto:"; urls in HTML output.
> Not to mention good for controlling parameters to a CGI GET.
> 
> I have implemented the escape-uri function as described on the above w3c URL.
> However, it is a patch to libxml2; I am not sure whether this belongs in
> libxml2 or in libxslt or neither?
> 
> I know that there are similar functions provided as extensions in other xslt
> engines, but I am using libxslt1, so that is no help. :-)
> 
> Again, to remphasize, I know all about the xsl tricks to handle ascii, but
> we are talking about utf-8. Japanese names in mail headers are very common.
> 
> Attached is the patch for libxml2.
> 
> Comments?

  1/ there is already an heavilly tested function for URI-Escaping in libxml2
     in the uri.c module, and it does it properly after having parsed the
     provided URI-Reference to do the escaping where possible (reread
     RFC 2396 the escaping algorithm you used can generate erroneous
     URI-Refereces I'm afraid by escaping blindly independantly of the
     position of the character in the string)
       see xmlURIEscapeStr() in uri.c

> +     xmlXPathRegisterFunc(ctxt, (const xmlChar *)"escape-uri",
> +                          xmlXPathEscapeUriFunction);

  2/ registering this extension function directly in the XPath library
     without an URI associated to the extension is plain wrong and would
     make libxml2 XPath implementation non-conformant. The extension fonction
     would have to be registered for example as an EXSLT funtion or under
     another specific namespace, but definitely not without prefix.

  Sorry 1/ and 2/ makes clearly impossible to integrate your patch as part
of libxml2. You can still use the API to register it that way in your code
but this will make your stylesheets unportable.

Daniel


-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard@redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]