* Michael Ludwig wrote, On 08/10/08 15:27:
Sam
Liddicott schrieb:
I need to do some limited text-node parsing
in libxslt.
I find parsing text very difficult in libxslt [...]
[...] libxslt doesn't seem to have regexp
support, or at least not
widely distributed or even packaged for most platforms (including
mine).
str::tokenize etc are not good because they are too destructive and
destroy the separating tokens.
The simplest _expression_ [...] one character at a time.
Clearly that is nuts.
I'll probably have to go for use of: contains, substring-before,
substring-after, substring and string-length; and maybe str:tokenize
just to get lengths of substrings up to multiple delimeters.
Clearly that is nuts too.
Have I missed anything obvious?
As you mention Perl, you may find it beneficial to use Perl for string
manipulation from within XSLT, Perl being far superior to XSLT 1.0 in
this respect.
sub ts_to_w3cdtf { ... }
XML::LibXSLT->register_function(
'urn:perl', 'ts-to-w3cdtf', \&ts_to_w3cdtf);
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:perl="urn:perl">
<xsl:value-of select="perl:ts-to-w3cdtf( @timestamp )"/>
Good tip; I think I won't get away with perl. This was for a light
webmail application.
I'll have enough trouble persuading them to go with libxslt for
html-message fixups instead of the existing C parser....
I've been looking at the exslt functions:
http://www.exslt.org/str/functions/split/str.split.function.xsl
http://www.exslt.org/str/functions/tokenize/str.tokenize.function.xsl
and they seem to do something like character-at-a-time recursion :-(
So I'll probably modify str:tokenize so it also returns the tokens
split-on, and in the webmail server I'll also implement this in C and
register the function so it's fast when called from the server.
Then I can just tokenize the text at white space, for-each each word
(and white space) and match on the URL's that interest me.
Sam
|