To further illustrate the issue, I have created a small test case - 5 XMLs included from each other. Each of the XMLs references an external file with the same base name (1.xml references 1.svg, 2.xml -> 2.svg, and so on). By the specification of xml:base, it is assumed that these references should refer to files in the same directory as the XML itself. The directory structure is:
1.xml [includes dir/2.xml] dir/2.xml [includes ../3.xml, 4.xml and dir2/5.xml] 3.xml dir/4.xml dir/dir2/5.xml
Then, there is a stylesheet listrefs.xsl which outputs the resolved entities. To resolve references using xml:base, I copied the templates used by DocBook XSL stylesheets (1.78.0).
Here is the output:
[xsltproc 1.1.27 using libxml2 TOT without the fix] 1.svg dir/2.svg dir/3.svg dir/4.svg dir/dir2/5.svg
[xsltproc 1.1.27 using libxml2 TOT with the fix] 1.svg dir/2.svg 3.svg dir/4.svg dir/dir2/5.svg
So, with the fix 3.svg is now correctly resolved to point to "3.svg" rather than "dir/3.svg".
Now, the weird thing: I also tried the same stylesheet with Saxon:
[saxon HE 9.4.0.6J] Warning: at xsl:stylesheet on line 2 column 80 of listrefs.xsl: Running an XSLT 1 stylesheet with an XSLT 2 processor 1.svg dir/2.svg dir/3.svg dir/dir/4.svg dir/dir/dir2/5.svg
It looks like Saxon outputs xml:base on each element relative to the top-level document, not to the containing included document. I wonder if Saxon's handling of xml:base fixup with XIncludes is completely broken, or libxml2 and DocBook XSL developers both misunderstood the xml:base specification.
Regards, Alexey.
On Monday, April 08, 2013 12:49:45 PM Alexey Neyman wrote: I think I know what is causing the issue. The code in xmlXIncludeLoadDoc looks at the url argument to see if it is relative path - to do so, it looks for slashes in the path. The problem is that xmlXIncludeLoadNode() passes down URIs that are relative to the top-level document, not to the most recent inclusion. Therefore, in the example below the url in xmlXIncludeLoadDoc() is just '3.xml', not '../3.xml' - and thus, the code wrongly considers it to be based in the same directory as the current included file.
The attached patch solves this problem. It removes a premature check on the 'url' argument - even if does not contain slashes, it may be a relative URI. Instead, it proceeds to build a URI relative to current node's base, and only aborts xml:base insertion if that relative URI does not contain slashes.
With this patch, the output from the test is:
<?xml version="1.0"?> <top xmlns:xi="http://www.w3.org/2001/XInclude"> <elem1 xmlns:xi="http://www.w3.org/2001/XInclude" xml:base="dir/2.xml"> <elem2 xmlns:xi="http://www.w3.org/2001/XInclude" xml:base="../3.xml"> <a fileref="x.svg"/> </elem2> </elem1> </top>
Regards, Alexey.
On Monday, April 08, 2013 12:36:08 AM Alexey Neyman wrote: Hi all,
I am encountering the following strange behavior with regard to xml:base. Here is an example:
----[ 1.xml ]---- <?xml version="1.0"?> <top xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href=""> </top> ------------------
----[ dir/2.xml ]---- <?xml version="1.0"?> <elem1 xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href=""> </elem1> ---------------------
----[ 3.xml ]---- <?xml version="1.0"?> <elem2> <a fileref="x.svg"/> </elem2> ------------------
Now, if I process 1.xml with 'xmllint --xinclude', I get:
$ xmllint --xinclude 1.xml <?xml version="1.0"?> <top xmlns:xi="http://www.w3.org/2001/XInclude"> <elem1 xmlns:xi="http://www.w3.org/2001/XInclude" xml:base="dir/2.xml"> <elem2> <a fileref="x.svg"/> </elem2> </elem1> </top>
The question is, why is xml:base missing on elem2? It is included from a different location than its ancestor, elem1. Is it a bug in libxml2, or am I missing something in the XInclude specification? As far as I can see, XInclude says:
"Each element information item in the top-level included items which has a different base URI than its include parent has an attribute information item added to its attributes property."
In this case base URI for elem2 is different than that for elem1 - so I think xml:base should be present.
This affects DocBook stylesheets: when they attempt to insert the references to external graphics (e.g. fo:external-graphic for XSL-FO output), they analyze xml:base on all of the element's ancestors. With inclusion like this, it incorrectly resolves the file reference like a/@fileref above to point to dir/x.svg. Any workarounds?
I am using 2.9.0 now, haven't tried with the "bleeding edge" yet.
Regards, Alexey. |
Attachment:
testcase.tgz
Description: application/compressed-tar