[xslt] indentation problem and some related questions



Hello. Recently I've been trying to rewrite one of my projects to use
xml+xslt instead of pulling webpages from an SQL database. During this
process I encountered some undesirable behavior, and I'd like to ask
you people who have more experience with XSLT to give any advice on
how to deal with this with as much finesse as possible.

The setup: a small xml (xhtml) file containing some xhtml data, and a
small xsl file to display the contents of the xhtml's <body> element.
For the transformation itself, we'll be using
     <xsl:copy-of select="document('test.xml')/*[1]/*[2]/node()" />

Issue 1: As long as there are any newlines in the xml file's data,
libxslt will not auto-indent it.
Having any text nodes inbetween elements in any depth of the result
node-set seems to trigger this.
<body>\n<p>test</p>\n</body> is not the same as <body><p>test</p></body>.
This makes it impossible to get a nicely indented page unless I cram
all data onto a single line (making it unreadable and hinder version
control).
Side-note: libxslt has a processing lag of 1 xml tag before resuming
auto-indentation. This uglifies the output a bit more.
Ref: http://mail.gnome.org/archives/xslt/2005-February/msg00029.html

Issue 2: unable to query document using element names.
Notice that in the xslt fragment above, I used document()/*[1]/*[2]/*
to get the desired outcome.
For some reason beyond my understanding, if I use
"document()/html/body/*", or any prefix of that, all I seem to get is
an empty node set.

Issue 3: namespace declarations in the xml document leak into the output.
My test case here uses a pure xhtml file for simplicity, but the
script I'm working on has a custom structure. It's fairly trivial:
entry -> title, date, content. All of these carry a custom namespace
prefix, with its namespace declared on the 'entry' element. There is
also a default namespace declaration, for the xhtml data inside
'content'.
Now the issue is, once I retrieve the contents of the 'content'
element, the custom namespace declaration will propagate to the
output.
It is fairly harmless and doesn't affect validity, but it looks really
messy. I don't want to leak any nonstandard namespaces that are only
used during processing.
Is there a cleaner solution to this than writing a copying template
that strips namespaces? I don't want to add unnecessary processing
since that only slows things down.

Thank you for reading this.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]