Re: [xslt] UTF-8 escaping



On Mon, Aug 19, 2002 at 12:06:03PM -0400, Daniel Veillard wrote:
>   xmlXPathRegisterFuncNS
>   in include/libxml/xpathInternals.h

Eek! I was sort of hoping to just use xsltproc (that way users could use any
xslt engine by specifying an alternate command). :-)
Well, I suppose I should suck in my gut...

Since this option would allow me to work with older libxsl/libxml I may
persue it. Along these lines, is there the equivalent of <xsl:fallback> 
for XPath? I know there is a function-available() but is that ok?

eg:

<xsl:if test="function-available('crazy-extension')">
 <xsl:value-of select="crazy-extension('blah')"/>
</xsl:if>

I am concerned that some compiling xslt engine out there will see my
crazy-extension and complain that it is unavailable.

> <pedantic>this is not a Recommendation, XQuery is a Working Draft
> and at the moment I would say that IETF rules the URI infrastructure
> not W3C, the RFCs are far more normative in this respect :-)</pedantic>

You are, of course, correct. ;-)

> > The function I implemented *will probably be* part of XPath.
> > It is listed for xpath 2.0:
> > 	http://www.w3.org/TR/xquery-operators/#func-escape-uri
> 
>   Hum, I don't claim  to get XPath 2.0 compliance, and even the 
> XQuery draft suggest to have it registered in their function namespace
> 
>    http://www.w3.org/2002/08/xquery-functions

I know you don't have XPath 2.0 compliance (since it isn't finished! :-); I
just wish you had that particular function... I simply cannot figure out any
other way to generate utf-8 in email headers.

> > I just thought it would be nicer to use the new function in XPath 2 than to
> > create yet another engine-specific extension.
> 
>   Well if you put it directly in the XPath core while it's not standardized
> then you make it an engine-specific extension, precisely :-)

True... But, when it becomes standarized, then it isn't anymore. From a
user's point of view (mine) this is slightly better: I use function only
available on Y, but in a few years it will be available on *.

>   Maybe XQuery extensions could be added to libxml2 but then I would really
> require them to be listed as XPath extensions anyway, seems to me it would
> fit your needs, right ?

Yes. OTOH, if you know of a pure xslt method (below) tell me!
This would be far prefered since I could have even IE6 render my xml.

> > PS. Is there any existing way in libxslt to do:
> > 	str->utf-8->hex
> > and get the answer back inside xsl?
> 
>   libxslt only manipulates strings in UTF8 internally, so the first step
> is somewhat trivial but the second part might be more challenging, I don't
> think there is an ad-hoc XSLT entry point for this, maybe EXSLT or this 
> could be done on pure XSLT ...

I know you use utf-8 inside. Isn't it wonderful?  :-)
	(except for the variable length thing... )

The thing I don't how to do though is to get the utf-8 exposed to me within
xsl. There doesn't seem to be a pop-single-byte-numeric-value-from-string
function. :-) 

Even a convert-to-unicode-number-from-single-char-string function would
suffice. Then I could re-encode the unicode number to utf-8 myself, and
hexify it. 

However, xslt seems bent on keeping char->integer functions out of
programmers hands! I understand the philosophy; there has already been so
much pain and suffering caused by charsets, but ... unicode! It's not the
be-all-and-end-all, but xml can only express values in it anyways.

I didn't see anything useful in EXSLT last time I looked. There is a
encode-uri function, but libxslt1 doesn't have it. (It is marked as 'other')

http://www.exslt.org/str/functions/encode-uri/index.html

... it also sucks for uri fragments since it doesn't escape ":/;?".
OTOH, it would work just fine for email since these are ok! :-)

Plus, this would permanently spoil any hope of my xsl working on IE6 since I
doubt they will ever adopt exslt, but xpath 2 is another matter.

PS. example where I need this utf-8 ability:
http://www.terpstra.ca/lurker/message/20020819092938.GA22015%40dat.etsit.upm.es.html
the "(reply)" link should be using utf-8, but for now I just give up on
these chars and encode =3F (?).

Thanks a lot for taking the time to respond to my concerns! 

-- 
Wesley W. Terpstra <wesley@terpstra.ca>



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]