Re: [xml] xml:base missing on result from XInclude?



To further illustrate the issue, I have created a small test case - 5 XMLs included from each other. Each of the XMLs references an external file with the same base name (1.xml references 1.svg, 2.xml -> 2.svg, and so on). By the specification of xml:base, it is assumed that these references should refer to files in the same directory as the XML itself. The directory structure is:

 

1.xml [includes dir/2.xml]

dir/2.xml [includes ../3.xml, 4.xml and dir2/5.xml]

3.xml

dir/4.xml

dir/dir2/5.xml

 

Then, there is a stylesheet listrefs.xsl which outputs the resolved entities. To resolve references using xml:base, I copied the templates used by DocBook XSL stylesheets (1.78.0).

 

Here is the output:

 

[xsltproc 1.1.27 using libxml2 TOT without the fix]

1.svg

dir/2.svg

dir/3.svg

dir/4.svg

dir/dir2/5.svg

 

[xsltproc 1.1.27 using libxml2 TOT with the fix]

1.svg

dir/2.svg

3.svg

dir/4.svg

dir/dir2/5.svg

 

So, with the fix 3.svg is now correctly resolved to point to "3.svg" rather than "dir/3.svg".

 

Now, the weird thing: I also tried the same stylesheet with Saxon:

 

[saxon HE 9.4.0.6J]

Warning: at xsl:stylesheet on line 2 column 80 of listrefs.xsl:

Running an XSLT 1 stylesheet with an XSLT 2 processor

1.svg

dir/2.svg

dir/3.svg

dir/dir/4.svg

dir/dir/dir2/5.svg

 

It looks like Saxon outputs xml:base on each element relative to the top-level document, not to the containing included document. I wonder if Saxon's handling of xml:base fixup with XIncludes is completely broken, or libxml2 and DocBook XSL developers both misunderstood the xml:base specification.

 

Regards,

Alexey.

 

On Monday, April 08, 2013 12:49:45 PM Alexey Neyman wrote:

I think I know what is causing the issue. The code in xmlXIncludeLoadDoc looks at the url argument to see if it is relative path - to do so, it looks for slashes in the path. The problem is that xmlXIncludeLoadNode() passes down URIs that are relative to the top-level document, not to the most recent inclusion. Therefore, in the example below the url in xmlXIncludeLoadDoc() is just '3.xml', not '../3.xml' - and thus, the code wrongly considers it to be based in the same directory as the current included file.

 

The attached patch solves this problem. It removes a premature check on the 'url' argument - even if does not contain slashes, it may be a relative URI. Instead, it proceeds to build a URI relative to current node's base, and only aborts xml:base insertion if that relative URI does not contain slashes.

 

With this patch, the output from the test is:

 

<?xml version="1.0"?>

<top xmlns:xi="http://www.w3.org/2001/XInclude">

<elem1 xmlns:xi="http://www.w3.org/2001/XInclude" xml:base="dir/2.xml">

<elem2 xmlns:xi="http://www.w3.org/2001/XInclude" xml:base="../3.xml">

<a fileref="x.svg"/>

</elem2>

</elem1>

</top>

 

 

Regards,

Alexey.

 

On Monday, April 08, 2013 12:36:08 AM Alexey Neyman wrote:

Hi all,

 

I am encountering the following strange behavior with regard to xml:base. Here is an example:

 

----[ 1.xml ]----

<?xml version="1.0"?>

<top xmlns:xi="http://www.w3.org/2001/XInclude">

<xi:include href="">

</top>

------------------

 

----[ dir/2.xml ]----

<?xml version="1.0"?>

<elem1 xmlns:xi="http://www.w3.org/2001/XInclude">

<xi:include href="">

</elem1>

---------------------

 

----[ 3.xml ]----

<?xml version="1.0"?>

<elem2>

<a fileref="x.svg"/>

</elem2>

------------------

 

Now, if I process 1.xml with 'xmllint --xinclude', I get:

 

$ xmllint --xinclude 1.xml

<?xml version="1.0"?>

<top xmlns:xi="http://www.w3.org/2001/XInclude">

<elem1 xmlns:xi="http://www.w3.org/2001/XInclude" xml:base="dir/2.xml">

<elem2>

<a fileref="x.svg"/>

</elem2>

</elem1>

</top>

 

The question is, why is xml:base missing on elem2? It is included from a different location than its ancestor, elem1. Is it a bug in libxml2, or am I missing something in the XInclude specification? As far as I can see, XInclude says:

 

"Each element information item in the top-level included items which has a different base URI than its include parent has an attribute information item added to its attributes property."

 

In this case base URI for elem2 is different than that for elem1 - so I think xml:base should be present.

 

This affects DocBook stylesheets: when they attempt to insert the references to external graphics (e.g. fo:external-graphic for XSL-FO output), they analyze xml:base on all of the element's ancestors. With inclusion like this, it incorrectly resolves the file reference like a/@fileref above to point to dir/x.svg. Any workarounds?

 

I am using 2.9.0 now, haven't tried with the "bleeding edge" yet.

 

Regards,

Alexey.





Attachment: testcase.tgz
Description: application/compressed-tar



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]