Re: [xml] xml:base missing on result from XInclude?



Hi Daniel, Alexey,

Alexey Neyman <stilor att net> writes:

I think I know what is causing the issue. The code in
xmlXIncludeLoadDoc looks at the url argument to see if it is relative
path - to do so, it looks for slashes in the path. The problem is that
xmlXIncludeLoadNode() passes down URIs that are relative to the top-
level document, not to the most recent inclusion. Therefore, in the
example below the url in xmlXIncludeLoadDoc() is just '3.xml', not
'../3.xml' - and thus, the code wrongly considers it to be based in
the same directory as the current included file.

Thanks for fixing this.  Maybe this whole "check for a slash to tell if
xml:base fixup is needed" logic is flawed, though?


I'm using libxml2 2.9.1 and lxml 3.2.1

Given these example files (similar to your examples, Alexey), I get no
xml:base fixup at all:

### sample files ##################################################
# generate three example files
mkdir test
cd test
cat >1.xml <<EOF
<?xml version="1.0"?>
<top xmlns:xi="http://www.w3.org/2001/XInclude";>
  <xi:include href="2.xml"/>
</top>
EOF
cat >2.xml <<EOF
<?xml version="1.0"?>
<elem1 xmlns:xi="http://www.w3.org/2001/XInclude";>
  <xi:include href="3.xml"/>
</elem1>
EOF
cat >3.xml <<EOF
<?xml version="1.0"?>
<elem2>
  <a fileref="x.svg"/>
</elem2>
EOF
### wrong output ##################################################
# expect xml:base fixup.  Get none :(
xmllint --xinclude 1.xml 
<?xml version="1.0"?>
<top xmlns:xi="http://www.w3.org/2001/XInclude";>
  <elem1 xmlns:xi="http://www.w3.org/2001/XInclude";>
  <elem2>
  <a fileref="x.svg"/>
</elem2>
</elem1>
</top>
###################################################################


The xml:base is not just the directory, it also contains the file name,
right?  The whole XInclude test suite behaves like that, see below.

So it _should_ look like this, shouldn't it?  This is what I get with
the attached patch to libxml:

### correct output ################################################
xmllint --xinclude 1.xml 
<?xml version="1.0"?>
<top xmlns:xi="http://www.w3.org/2001/XInclude";>
  <elem1 xmlns:xi="http://www.w3.org/2001/XInclude"; xml:base="2.xml">
  <elem2 xml:base="3.xml">
  <a fileref="x.svg"/>
</elem2>
</elem1>
</top>
###################################################################



The XInclude test suite agrees, when run with the attached script, like
this.

###################################################################
cvs -d:pserver:anonymous dev w3 org:/sources/public \
   co  2001/XInclude-Test-Suite  XInclude-Test-Suite

cd XInclude-Test-Suite

python3 PATH-TO/run-tests-with-lxml.py
###################################################################

This gets about 15 less failures when run with the patch below, and
afaict from a review with/without patch, there is no additional ones.

So it should be an improvement :)



S.

Do xml:base fixup for file name changes in the same directory, too.
The "if it contains no slash, it needs no fixup" logic breaks the
XInclude test suite.

Index: libxml2-2.9.1/xinclude.c
===================================================================
--- libxml2-2.9.1.orig/xinclude.c
+++ libxml2-2.9.1/xinclude.c
@@ -1685,7 +1685,7 @@ loaded:
 #endif
 
     /*
-     * Do the xml:base fixup if needed
+     * Do the xml:base fixup as needed
      */
     if ((doc != NULL) && (URL != NULL) && (xmlStrchr(URL, (xmlChar) '/')) &&
         (!(ctxt->parseFlags & XML_PARSE_NOBASEFIX)) &&
@@ -1695,28 +1695,26 @@ loaded:
        xmlChar *curBase;
 
        /*
-        * The base is only adjusted if "necessary", i.e. if the xinclude node
-        * has a base specified, or the URL is relative
+        * The xml:base is adjusted as necessary.  Possibly the
+        * xinclude node has a base specified?
         */
        base = xmlGetNsProp(ctxt->incTab[nr]->ref, BAD_CAST "base",
                        XML_XML_NAMESPACE);
        if (base == NULL) {
            /*
-            * No xml:base on the xinclude node, so we check whether the
-            * URI base is different than (relative to) the context base
+            * No xml:base on the xinclude node.  Compute the base
+            * from the URL of the included document, if possible
+            * relative to the context base.  See
+            * uri.c:xmlBuildRelativeURI for the relative/absolute
+            * magic.
             */
            curBase = xmlBuildRelativeURI(URL, ctxt->base);
            if (curBase == NULL) {      /* Error return */
                xmlXIncludeErr(ctxt, ctxt->incTab[nr]->ref,
                       XML_XINCLUDE_HREF_URI,
                       "trying to build relative URI from %s\n", URL);
-           } else {
-               /* If the URI doesn't contain a slash, it's not relative */
-               if (!xmlStrchr(curBase, (xmlChar) '/'))
-                   xmlFree(curBase);
-               else
-                   base = curBase;
            }
+           base = curBase;
        }
        if (base != NULL) {     /* Adjustment may be needed */
            node = ctxt->incTab[nr]->inc;
#!/usr/bin/env python3

# (C) 2014 Susanne Oberhauser-Hirschoff <froh suse com>
# The MIT license applies http://opensource.org/licenses/MIT

"""
# Run the XInclude test suite through lxml:

# get the test suite
cvs -d:pserver:anonymous dev w3 org:/sources/public \
   co 2001/XInclude-Test-Suite XInclude-Test-Suite

cd XInclude-Test-Suite

# run this script
python3 PATH-TO/run-tests-with-lxml.py

"""

from lxml import etree, objectify

tests = objectify.parse('testdescr.xml').getroot()

feature2xmllint_option = {
    'xpointer-scheme': '',
    'unexpanded-entities': None,
    'unparsed-entities': None,
    'lang-fixup': None,
}

class TC: pass

tcs = list()

for suite in tests.testcases:
    basedir = suite.get('basedir')
    creator = suite.get('creator')
    for case in suite.testcase:
        tc = TC()
        tc.basedir = basedir
        tc.creator = creator

        tc.id = case.get('id')

        tc.file = case.get('href')

        # success, error or optional
        tc.type = case.get('type')

        if tc.type == 'error':
            tc.result_file = None
        else:
            tc.result_file = case.output


        required_features = case.get('features')
        if required_features is None:
            tc.required_features = list()
        else:
            tc.required_features = required_features.split()

        tcs.append(tc)

for tc in tcs:
    if tc.required_features is None:
        tc.xmllint_options = ['']
    else:
        tc.xmllint_options = tuple(feature2xmllint_option[f] 
                                for f in tc.required_features)

    if None in tc.xmllint_options:
        tc.unhandled_features = tuple(
            filter(
                lambda x: None is feature2xmllint_option[x],
                tc.required_features
            ))
    else:
        tc.unhandled_features = None


def xinclude_expand(tc):
    filename = "{tc.basedir}/{tc.file}".format(tc=tc)


    got = etree.parse(filename)
    got.xinclude()

    result = ['<?xml version="1.0"?>']
    result.extend( etree.tostring(got, encoding=str).splitlines())

    return filename, result


import difflib

for tc in tcs:
    if tc.unhandled_features != None:
        print("untested: {tc.creator}-{tc.id}: can't handle options {tc.unhandled_features}\n".format(tc=tc))
        continue

    try:
        tofile, got = xinclude_expand(tc)
        fromfile   = "{tc.basedir}/{tc.result_file}".format(tc=tc)
        with open(fromfile) as f:
            expected = f.read().splitlines()

        diff = difflib.unified_diff(expected, got,
                                    fromfile=fromfile,
                                    tofile="lxml.etree.parse( {} ).xinclude().tostring()".format(tofile),
                                    lineterm='')

        diff = list(diff)

        if len(diff) == 0:
            print("pass: {tc.creator}-{tc.id}".format(tc=tc))
        else:
            print("###{:#<64}".format(" diff: {tc.creator}-{tc.id} ".format(tc=tc)))
            for line in diff: print(line)
            print('###################################################################')

    except Exception as e:
        if tc.type == 'error':
            print("pass: {tc.creator}-{tc.id}: expected error {e}".format(tc=tc,e=e))
        else:
            print("fail: {tc.creator}-{tc.id}: unexpected error {e}".format(tc=tc,e=e))


-- 
Susanne Oberhauser                     SUSE LINUX Products GmbH
+49-911-74053-574                      Maxfeldstraße 5
Processes and Infrastructure           90409 Nürnberg
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]