Re: [xslt] indentation problem and some related questions



Hi,

These are mostly basic XSL problems (except, perhaps, the last one) and should probably be asked on mulberry tech's XSL list.

But, some possible answers inline below.


On Nov 15, 2008, at 8:14 AM, Viktor Štujber wrote:

Hello. Recently I've been trying to rewrite one of my projects to use
xml+xslt instead of pulling webpages from an SQL database. During this
process I encountered some undesirable behavior, and I'd like to ask
you people who have more experience with XSLT to give any advice on
how to deal with this with as much finesse as possible.

The setup: a small xml (xhtml) file containing some xhtml data, and a
small xsl file to display the contents of the xhtml's <body> element.
For the transformation itself, we'll be using
    <xsl:copy-of select="document('test.xml')/*[1]/*[2]/node()" />


Issue 1: As long as there are any newlines in the xml file's data,
libxslt will not auto-indent it.
Having any text nodes inbetween elements in any depth of the result
node-set seems to trigger this.
<body>\n<p>test</p>\n</body> is not the same as <body><p>test</p></body>.
This makes it impossible to get a nicely indented page unless I cram
all data onto a single line (making it unreadable and hinder version
control).
Side-note: libxslt has a processing lag of 1 xml tag before resuming
auto-indentation. This uglifies the output a bit more.
Ref: http://mail.gnome.org/archives/xslt/2005-February/msg00029.html

You could add the top-level instruction:

<xsl:strip-space select="*"/>

But, this is a performance hit. 



Issue 2: unable to query document using element names.
Notice that in the xslt fragment above, I used document()/*[1]/*[2]/*
to get the desired outcome.
For some reason beyond my understanding, if I use
"document()/html/body/*", or any prefix of that, all I seem to get is
an empty node set.

First, if you are actually using 'document()', then you are getting the XSL document, not the source XML.

But, I would bet the problem is your XHTML document is in the XHTML namespace. You need to declare it (in your root xsl:styleshseet elem, most likely) like:


then you can do:

<xsl:apply-templates
  select="document('foo.xml')/x:html/x:body/*"/>
</div>



Issue 3: namespace declarations in the xml document leak into the output.
My test case here uses a pure xhtml file for simplicity, but the
script I'm working on has a custom structure. It's fairly trivial:
entry -> title, date, content. All of these carry a custom namespace
prefix, with its namespace declared on the 'entry' element. There is
also a default namespace declaration, for the xhtml data inside
'content'.
Now the issue is, once I retrieve the contents of the 'content'
element, the custom namespace declaration will propagate to the
output.
It is fairly harmless and doesn't affect validity, but it looks really
messy. I don't want to leak any nonstandard namespaces that are only
used during processing.
Is there a cleaner solution to this than writing a copying template
that strips namespaces?

No, xsl:copy-of will copy everything. I am guessing, but using the identity templates like so:

<xsl:template match="@*|text()">
  <xsl:copy-of select="."/>
</xsl:template>

<xsl:template match="*">
  <xsl:element name="{local-name()}" namespace="http://www.w3.org/1999/xhtml">
    <xsl:apply-templates select="@*|node()"/>
  </xsl:element>
</xsl:template>

might even be faster because the processor is being told explicitly what to do, rather than having to backtrack to figure out the namespace for the element. But, like I said, I am guessing and Daniel would know more about this.

You might also want to convert you custom elems (like 'content'):

<xsl:template match="myns:*">
  <div class="{local-name()}" xmlns="http://www.w3.org/1999/xhtml">
    <xsl:apply-templates/>
  </div>
</xsl:template>


best,
-Rob

I don't want to add unnecessary processing
since that only slows things down.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]