Re: [xml] xml:base missing on result from XInclude?



Susanne Oberhauser-Hirschoff <froh suse com> writes:

Salut Daniel,

What's the use cases that do thousands of xi:includes of tiny xml
fragments, rendering the current tuning necessary?

If that's real I could redo a patch with an option.

ok, below is a patch that adds an option --fixup-all-base-uris aka
XML_PARSE_ALLBASEFIX, which allows the libxml2 / xmllint user to choose,
if she considers xml:base fixup in the same path 'clutter' or a 'bug
fix'.


I still believe there is a reason the XInclude test cases do the clutter
version, but if libxml2 usually is used in contexts the clutter is
useless, this variant would give the exotic other users a chance to get
what they need, too.


Btw, in 2003 you wrote this...

https://www.sourceware.org/ml/docbook/2003-03/msg00101.html

On Sun, Mar 09, 2003 at 02:15:55PM -0500, Elliotte Rusty Harold wrote:
At 2:02 PM -0500 2/12/03, Daniel Veillard wrote:

It's rather libxml2 now comply to the XInclude requirement of adding such
an xml:base at the inclusion point (when the included resource is in
a different path ...)


I'm looking at this for my XIncluder right now, and the requirement 
seems a little stronger to me. They don't even have to be in a 
different path. Suppose for example, 
http://www.example.com/docs/parent.xml includes 
http://www.example.com/docs/child.xml

These two documents have different base URIs even though they have 
the same "path". Thus an xml:base attribute must be added at the 
inclusion point whenever parse="xml". The only possible exception 
would be when both the includer and the included document has null or 
empty base URIs, or perhaps when XPointers are involved and one 
xinclude element is including a different part of the same document.

  I tried to minimize the addition of xml:base when it could be avoided
in practice (i.e. if the absence of the xml:base would not generate
erroneous URI-References to URI computations). This was a deployment
trade-off that I will fix when XInclude and xml:base will get better
acceptance.

Daniel


Please let me know what you think,

thx,


S.

commit 633f764813e8b552bf77e8b34c77d5642b028063
Author: Susanne Oberhauser <froh suse com>
Date:   Wed Apr 23 13:53:14 2014 +0000

    do xml:base fixup on all xi:include roots, even if the new base is in the same directory

diff --git a/doc/APIfiles.html b/doc/APIfiles.html
index 65e004b..4dd5d04 100644
--- a/doc/APIfiles.html
+++ b/doc/APIfiles.html
@@ -612,6 +612,7 @@ A:link, A:visited, A:active { text-decoration: underline }
 <a href="html/libxml-parser.html#XML_PARSER_START">XML_PARSER_START</a><br />
 <a href="html/libxml-parser.html#XML_PARSER_START_TAG">XML_PARSER_START_TAG</a><br />
 <a href="html/libxml-parser.html#XML_PARSER_SYSTEM_LITERAL">XML_PARSER_SYSTEM_LITERAL</a><br />
+<a href="html/libxml-parser.html#XML_PARSE_ALLBASEFIX">XML_PARSE_ALLBASEFIX</a><br />
 <a href="html/libxml-parser.html#XML_PARSE_BIG_LINES">XML_PARSE_BIG_LINES</a><br />
 <a href="html/libxml-parser.html#XML_PARSE_COMPACT">XML_PARSE_COMPACT</a><br />
 <a href="html/libxml-parser.html#XML_PARSE_DOM">XML_PARSE_DOM</a><br />
diff --git a/doc/APIsymbols.html b/doc/APIsymbols.html
index c2b82e7..f2e6d18 100644
--- a/doc/APIsymbols.html
+++ b/doc/APIsymbols.html
@@ -594,6 +594,7 @@ A:link, A:visited, A:active { text-decoration: underline }
 <a href="html/libxml-xmlreader.html#XML_PARSER_SUBST_ENTITIES">XML_PARSER_SUBST_ENTITIES</a><br />
 <a href="html/libxml-parser.html#XML_PARSER_SYSTEM_LITERAL">XML_PARSER_SYSTEM_LITERAL</a><br />
 <a href="html/libxml-xmlreader.html#XML_PARSER_VALIDATE">XML_PARSER_VALIDATE</a><br />
+<a href="html/libxml-parser.html#XML_PARSE_ALLBASEFIX">XML_PARSE_ALLBASEFIX</a><br />
 <a href="html/libxml-parser.html#XML_PARSE_BIG_LINES">XML_PARSE_BIG_LINES</a><br />
 <a href="html/libxml-parser.html#XML_PARSE_COMPACT">XML_PARSE_COMPACT</a><br />
 <a href="html/libxml-parser.html#XML_PARSE_DOM">XML_PARSE_DOM</a><br />
diff --git a/doc/devhelp/libxml2-parser.html b/doc/devhelp/libxml2-parser.html
index 357c14a..6f114ab 100644
--- a/doc/devhelp/libxml2-parser.html
+++ b/doc/devhelp/libxml2-parser.html
@@ -311,6 +311,8 @@ void        <a href="#xmlSetExternalEntityLoader">xmlSetExternalEntityLoader</a>    (<a hr
     <a name="XML_PARSE_OLDSAX">XML_PARSE_OLDSAX</a> = 1048576 /* parse using SAX2 interface before 2.7.0 */
     <a name="XML_PARSE_IGNORE_ENC">XML_PARSE_IGNORE_ENC</a> = 2097152 /* ignore internal document encoding 
hint */
     <a name="XML_PARSE_BIG_LINES">XML_PARSE_BIG_LINES</a> = 4194304 /*  Store big lines numbers in text PSVI 
field */
+    <a name="XML_PARSE_ALLBASEFIX">XML_PARSE_ALLBASEFIX</a> = 8388608 /* do xml:base fixup for _all_ 
XINCLUDEs */
+
 };
 </pre><p/>
 </div>
diff --git a/doc/devhelp/libxml2.devhelp b/doc/devhelp/libxml2.devhelp
index 282546a..cb85fb5 100644
--- a/doc/devhelp/libxml2.devhelp
+++ b/doc/devhelp/libxml2.devhelp
@@ -776,6 +776,7 @@
     <function name="XML_PARSER_SYSTEM_LITERAL" link="libxml2-parser.html#XML_PARSER_SYSTEM_LITERAL"/>
     <function name="XML_PARSER_VALIDATE" link="libxml2-xmlreader.html#XML_PARSER_VALIDATE"/>
     <function name="XML_PARSE_BIG_LINES" link="libxml2-parser.html#XML_PARSE_BIG_LINES"/>
+    <function name="XML_PARSE_ALLBASEFIX" link="libxml2-parser.html#XML_PARSE_ALLBASEFIX"/>
     <function name="XML_PARSE_COMPACT" link="libxml2-parser.html#XML_PARSE_COMPACT"/>
     <function name="XML_PARSE_DOM" link="libxml2-parser.html#XML_PARSE_DOM"/>
     <function name="XML_PARSE_DTDATTR" link="libxml2-parser.html#XML_PARSE_DTDATTR"/>
diff --git a/doc/html/libxml-parser.html b/doc/html/libxml-parser.html
index 98123f7..8f7ede9 100644
--- a/doc/html/libxml-parser.html
+++ b/doc/html/libxml-parser.html
@@ -290,6 +290,7 @@ void        <a href="#xmlParserInputDeallocate">xmlParserInputDeallocate</a>        (<a 
href="
     <a name="XML_PARSE_OLDSAX" id="XML_PARSE_OLDSAX">XML_PARSE_OLDSAX</a> = 1048576 : parse using SAX2 
interface before 2.7.0
     <a name="XML_PARSE_IGNORE_ENC" id="XML_PARSE_IGNORE_ENC">XML_PARSE_IGNORE_ENC</a> = 2097152 : ignore 
internal document encoding hint
     <a name="XML_PARSE_BIG_LINES" id="XML_PARSE_BIG_LINES">XML_PARSE_BIG_LINES</a> = 4194304 : Store big 
lines numbers in text PSVI field
+    <a name="XML_PARSE_ALLBASEFIX" id="XML_PARSE_ALLBASEFIX">XML_PARSE_ALLBASEFIX</a> = 8388608 : fixup 
xml:base uris for same directory includes, too
 }
 </pre><h3><a name="xmlSAXHandlerV1" id="xmlSAXHandlerV1">Structure xmlSAXHandlerV1</a></h3><pre 
class="programlisting">Structure xmlSAXHandlerV1<br />struct _xmlSAXHandlerV1 {
     <a href="libxml-parser.html#internalSubsetSAXFunc">internalSubsetSAXFunc</a>       internalSubset
diff --git a/doc/libxml2-api.xml b/doc/libxml2-api.xml
index 45bceb5..c8ba483 100644
--- a/doc/libxml2-api.xml
+++ b/doc/libxml2-api.xml
@@ -720,6 +720,7 @@
      <exports symbol='XML_WITH_OUTPUT' type='enum'/>
      <exports symbol='XML_PARSE_XINCLUDE' type='enum'/>
      <exports symbol='XML_PARSE_NOCDATA' type='enum'/>
+     <exports symbol='XML_PARSE_ALLBASEFIX' type='enum'/>
      <exports symbol='XML_PARSE_NOBASEFIX' type='enum'/>
      <exports symbol='XML_PARSE_BIG_LINES' type='enum'/>
      <exports symbol='XML_WITH_XINCLUDE' type='enum'/>
@@ -5137,6 +5138,7 @@ crash if you try to modify the tree)'/>
     <enum name='XML_PARSE_DTDVALID' file='parser' value='16' type='xmlParserOption' info='validate with the 
DTD'/>
     <enum name='XML_PARSE_HUGE' file='parser' value='524288' type='xmlParserOption' info='relax any 
hardcoded limit from the parser'/>
     <enum name='XML_PARSE_IGNORE_ENC' file='parser' value='2097152' type='xmlParserOption' info='ignore 
internal document encoding hint'/>
+    <enum name='XML_PARSE_ALLBASEFIX' file='parser' value='8388608' type='xmlParserOption' info='do xml:base 
fixup for _all_ XINCLUDEs'/>
     <enum name='XML_PARSE_NOBASEFIX' file='parser' value='262144' type='xmlParserOption' info='do not fixup 
XINCLUDE xml:base uris'/>
     <enum name='XML_PARSE_NOBLANKS' file='parser' value='256' type='xmlParserOption' info='remove blank 
nodes'/>
     <enum name='XML_PARSE_NOCDATA' file='parser' value='16384' type='xmlParserOption' info='merge CDATA as 
text nodes'/>
diff --git a/doc/libxml2-refs.xml b/doc/libxml2-refs.xml
index b33d103..9351da9 100644
--- a/doc/libxml2-refs.xml
+++ b/doc/libxml2-refs.xml
@@ -588,6 +588,7 @@
     <reference name='XML_PARSER_SUBST_ENTITIES' href='html/libxml-xmlreader.html#XML_PARSER_SUBST_ENTITIES'/>
     <reference name='XML_PARSER_SYSTEM_LITERAL' href='html/libxml-parser.html#XML_PARSER_SYSTEM_LITERAL'/>
     <reference name='XML_PARSER_VALIDATE' href='html/libxml-xmlreader.html#XML_PARSER_VALIDATE'/>
+    <reference name='XML_PARSE_ALLBASEFIX' href='html/libxml-parser.html#XML_PARSE_ALLBASEFIX'/>
     <reference name='XML_PARSE_BIG_LINES' href='html/libxml-parser.html#XML_PARSE_BIG_LINES'/>
     <reference name='XML_PARSE_COMPACT' href='html/libxml-parser.html#XML_PARSE_COMPACT'/>
     <reference name='XML_PARSE_DOM' href='html/libxml-parser.html#XML_PARSE_DOM'/>
@@ -4189,6 +4190,7 @@
       <ref name='XML_PARSER_SUBST_ENTITIES'/>
       <ref name='XML_PARSER_SYSTEM_LITERAL'/>
       <ref name='XML_PARSER_VALIDATE'/>
+      <ref name='XML_PARSE_ALLBASEFIX'/>
       <ref name='XML_PARSE_BIG_LINES'/>
       <ref name='XML_PARSE_COMPACT'/>
       <ref name='XML_PARSE_DOM'/>
@@ -11383,6 +11385,7 @@
       <ref name='XML_PARSER_START'/>
       <ref name='XML_PARSER_START_TAG'/>
       <ref name='XML_PARSER_SYSTEM_LITERAL'/>
+      <ref name='XML_PARSE_ALLBASEFIX'/>
       <ref name='XML_PARSE_BIG_LINES'/>
       <ref name='XML_PARSE_COMPACT'/>
       <ref name='XML_PARSE_DOM'/>
diff --git a/include/libxml/parser.h b/include/libxml/parser.h
index 3f5730d..e87280e 100644
--- a/include/libxml/parser.h
+++ b/include/libxml/parser.h
@@ -1111,7 +1111,8 @@ typedef enum {
     XML_PARSE_HUGE      = 1<<19,/* relax any hardcoded limit from the parser */
     XML_PARSE_OLDSAX    = 1<<20,/* parse using SAX2 interface before 2.7.0 */
     XML_PARSE_IGNORE_ENC= 1<<21,/* ignore internal document encoding hint */
-    XML_PARSE_BIG_LINES = 1<<22 /* Store big lines numbers in text PSVI field */
+    XML_PARSE_BIG_LINES = 1<<22,/* Store big lines numbers in text PSVI field */
+    XML_PARSE_ALLBASEFIX= 1<<23 /* do xml:base fixup for _all_ XINCLUDEs */
 } xmlParserOption;
 
 XMLPUBFUN void XMLCALL
diff --git a/parser.c b/parser.c
index ee429f3..1acc5a2 100644
--- a/parser.c
+++ b/parser.c
@@ -15111,6 +15111,14 @@ xmlCtxtUseOptionsInternal(xmlParserCtxtPtr ctxt, int options, const char *encodi
        ctxt->options |= XML_PARSE_NOBASEFIX;
         options -= XML_PARSE_NOBASEFIX;
     }
+    if (options & XML_PARSE_ALLBASEFIX) {
+       /* 
+        * There is no check for NOBASEFIX vs ALLBASEFIX.
+        * NOBASEFIX will override ALLBASEFIX.
+        */
+       ctxt->options |= XML_PARSE_ALLBASEFIX;
+        options -= XML_PARSE_ALLBASEFIX;
+    }
     if (options & XML_PARSE_HUGE) {
        ctxt->options |= XML_PARSE_HUGE;
         options -= XML_PARSE_HUGE;
diff --git a/xinclude.c b/xinclude.c
index 107ac03..e90c4ab 100644
--- a/xinclude.c
+++ b/xinclude.c
@@ -1685,7 +1685,7 @@ loaded:
 #endif
 
     /*
-     * Do the xml:base fixup if needed
+     * Do the xml:base fixup as needed
      */
     if ((doc != NULL) && (URL != NULL) &&
         (!(ctxt->parseFlags & XML_PARSE_NOBASEFIX)) &&
@@ -1695,27 +1695,41 @@ loaded:
        xmlChar *curBase;
 
        /*
-        * The base is only adjusted if "necessary", i.e. if the xinclude node
-        * has a base specified, or the URL is relative
+        * The xml:base is adjusted as necessary.  Possibly the
+        * xinclude node has a base specified?
         */
        base = xmlGetNsProp(ctxt->incTab[nr]->ref, BAD_CAST "base",
                        XML_XML_NAMESPACE);
        if (base == NULL) {
            /*
-            * No xml:base on the xinclude node, so we check whether the
-            * URI base is different than (relative to) the context base
+            * No xml:base on the xinclude node.  Compute the base
+            * from the URL of the included document, if possible
+            * relative to the context base.  See
+            * uri.c:xmlBuildRelativeURI for the relative/absolute
+            * magic.
             */
            curBase = xmlBuildRelativeURI(URL, ctxt->base);
            if (curBase == NULL) {      /* Error return */
                xmlXIncludeErr(ctxt, ctxt->incTab[nr]->ref,
                       XML_XINCLUDE_HREF_URI,
                       "trying to build relative URI from %s\n", URL);
+           } else if (((ctxt->parseFlags & XML_PARSE_ALLBASEFIX)) ||
+                      ((doc->parseFlags & XML_PARSE_ALLBASEFIX)) ||
+                      xmlStrchr(curBase, (xmlChar) '/')) {
+               base = curBase;
            } else {
-               /* If the URI doesn't contain a slash, it's not relative */
-               if (!xmlStrchr(curBase, (xmlChar) '/'))
-                   xmlFree(curBase);
-               else
-                   base = curBase;
+               /* 
+                * The XML_PARSE_ALLBASEFIX flag is unset, so we do
+                * minimal fixup, and don't modify xml:base if new
+                * base shares the path with the parent.  In that
+                * case, all URIs references within the included will
+                * lead to the same place, whether we fixup the
+                * xml:base or not.  However we drop file changes in
+                * the same path.  If you also need xml:base fixup for
+                * same path document changes, use
+                * XML_PARSE_ALLBASEFIX.
+                */
+               xmlFree(curBase);
            }
        }
        if (base != NULL) {     /* Adjustment may be needed */
diff --git a/xmllint.c b/xmllint.c
index 26d8db1..ba7f5b1 100644
--- a/xmllint.c
+++ b/xmllint.c
@@ -3053,6 +3053,7 @@ static void usage(const char *name) {
     printf("\t--xinclude : do XInclude processing\n");
     printf("\t--noxincludenode : same but do not generate XInclude nodes\n");
     printf("\t--nofixup-base-uris : do not fixup xml:base uris\n");
+    printf("\t--fixup-all-base-uris : fixup xml:base for same path, new document XInclude, too\n");
 #endif
     printf("\t--loaddtd : fetch external DTD\n");
     printf("\t--dtdattr : loaddtd + populate the tree with inherited attributes \n");
@@ -3280,6 +3281,13 @@ main(int argc, char **argv) {
            options |= XML_PARSE_XINCLUDE;
            options |= XML_PARSE_NOBASEFIX;
        }
+       else if ((!strcmp(argv[i], "-fixup-all-base-uris")) ||
+                (!strcmp(argv[i], "--fixup-all-base-uris"))) {
+           xinclude++;
+           options |= XML_PARSE_XINCLUDE;
+           options |= XML_PARSE_ALLBASEFIX;
+           options ^= XML_PARSE_NOBASEFIX & options;
+       }
 #endif
 #ifdef LIBXML_OUTPUT_ENABLED
 #ifdef HAVE_ZLIB_H

commit 44bd5c1a52b632502d2d9cd42255c19563d2a459
Author: Susanne Oberhauser <froh suse com>
Date:   Wed Apr 23 13:17:12 2014 +0000

        Remove premature check on URI being relative (gives false negatives).
        This is Alexey Neumann's first fix to xml:base handling

diff --git a/xinclude.c b/xinclude.c
index ace005b..107ac03 100644
--- a/xinclude.c
+++ b/xinclude.c
@@ -1687,7 +1687,7 @@ loaded:
     /*
      * Do the xml:base fixup if needed
      */
-    if ((doc != NULL) && (URL != NULL) && (xmlStrchr(URL, (xmlChar) '/')) &&
+    if ((doc != NULL) && (URL != NULL) &&
         (!(ctxt->parseFlags & XML_PARSE_NOBASEFIX)) &&
        (!(doc->parseFlags & XML_PARSE_NOBASEFIX))) {
        xmlNodePtr node;

-- 
Susanne Oberhauser                     SUSE LINUX Products GmbH
+49-911-74053-574                      Maxfeldstraße 5
Processes and Infrastructure           90409 Nürnberg
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]