Re: [xml] [PATCH 0/2] uri: Don't spoil addresses when formatting them

On Mon, Sep 29, 2014 at 10:09:57AM +0200, Martin Kletzander wrote:
[resending because it appears that my previous posting haven't reached
 the list]

Since commit 8eb55d782a2b9afacc7938694891cc6fad7b42a5, when you parse
and save an URI that has no server (or similar) part, two slashes
after the 'schema:' get lost.  It means 'uri:///noserver' is turned
into 'uri:/noserver'.

This can break some applicatication that rely on those slashes to be
in (e.g. libvirt).  This micro-series proposes 2 different approaches
to fixing this that have one slight difference.  Approach 1 adds a
field "slashes_used" into the xmlURI structure that is set to 1 if and
only if double slashes were skipped when parsing.  The other approach
simply checks whether path is absolute (starting with '/' and adds
those two slashes when that condition is true.

The difference is that the second approach changes 'uri:/only/path' to
'uri:///only/path', but doesn't fiddle with the structure insides.

The probelm is caused by adaptations in RFC 3986 being ambiguously
implemented in uri.c; for example after skipping "//", the parsing can
follow almost the same rules or for example after skipping first '/'
in the path the code can be exactly the same but there are more
functions for that.  Nevertheless this series aims to fix the issue
caused by commit 8eb55d782a2b9afacc7938694891cc6fad7b42a5 and leaves
these cleanups to be done as a follow-up patch later on.

I'm Cc'ing the author of 8eb55d782a2b9afacc7938694891cc6fad7b42a5 in
order for him to be able to check this patch and let me (us) know
whether the approaches proposed here are also usable as a fix for
their use case.

  Okay I think I have a better option:


means a host of "" while


means no host at all

  So the best fix IMHO is to fix the URI parser to record the first
case and an empty host string and the second case as a NULL host string

 I would not revert the initial patch, we should not 'invent' those
slash, but we should instead when parsing keep the information that
it's a host based path and that foo:/// means the presence of a host
but an empty one.

Once applied the resulting patch below, all cases seems to be saved

thinkpad:~/XML -> ./testURI uri:/noserver
thinkpad:~/XML -> ./testURI uri:///noserver
thinkpad:~/XML -> ./testURI uri://server/foo
thinkpad:~/XML -> ./testURI uri:/noserver/foo
thinkpad:~/XML -> ./testURI uri:///
thinkpad:~/XML -> ./testURI uri://
thinkpad:~/XML -> ./testURI uri:/
thinkpad:~/XML ->

  If you revert the initial patch that last case fails

The problem is that I don't want to change the xmlURI structure to
minimize ABI breakage, so I could not extend the field. The natural
solution is to denote that uri:/// has an empty host by making
the uri server field an empty string which works very well but breaks
applications (like libvirt ;-) who blindly look at uri->server
not being NULL to try to reach it !
Simplest was to stick the port to -1 in that case, instead of 0
application don't bother looking at the port of there is no server
string, this makes the patch more complex than a 1 liner, but
is better for ABI.

With that patch libvirt test suite passes again except one case:

115) QEMU XML-2-ARGV disk-drive-network-gluster
Offset 295
Expect [//V]
Actual [V]

  I expect that to be a bug in the way the gluster/rdb driver builds
an URI since all other URI building work in a compatible fashion,
this need to be examined, but I don't consider this a libxml2 bug at
this point,



diff --git a/uri.c b/uri.c
index d4dcd2f..ff47abb 100644
--- a/uri.c
+++ b/uri.c
@@ -759,6 +759,8 @@ xmlParse3986HierPart(xmlURIPtr uri, const char **str)
         cur += 2;
        ret = xmlParse3986Authority(uri, &cur);
        if (ret != 0) return(ret);
+       if (uri->server == NULL)
+           uri->port = -1;
        ret = xmlParse3986PathAbEmpty(uri, &cur);
        if (ret != 0) return(ret);
        *str = cur;
@@ -1106,7 +1108,7 @@ xmlSaveUri(xmlURIPtr uri) {
     } else {
-       if (uri->server != NULL) {
+       if ((uri->server != NULL) || (uri->port == -1)) {
            if (len + 3 >= max) {
                 temp = xmlSaveUriRealloc(ret, &max);
                 if (temp == NULL) goto mem_error;
@@ -1143,22 +1145,24 @@ xmlSaveUri(xmlURIPtr uri) {
                ret[len++] = '@';
-           p = uri->server;
-           while (*p != 0) {
-               if (len >= max) {
-                    temp = xmlSaveUriRealloc(ret, &max);
-                    if (temp == NULL) goto mem_error;
-                    ret = temp;
+           if (uri->server != NULL) {
+               p = uri->server;
+               while (*p != 0) {
+                   if (len >= max) {
+                       temp = xmlSaveUriRealloc(ret, &max);
+                       if (temp == NULL) goto mem_error;
+                       ret = temp;
+                   }
+                   ret[len++] = *p++;
-               ret[len++] = *p++;
-           }
-           if (uri->port > 0) {
-               if (len + 10 >= max) {
-                    temp = xmlSaveUriRealloc(ret, &max);
-                    if (temp == NULL) goto mem_error;
-                    ret = temp;
+               if (uri->port > 0) {
+                   if (len + 10 >= max) {
+                       temp = xmlSaveUriRealloc(ret, &max);
+                       if (temp == NULL) goto mem_error;
+                       ret = temp;
+                   }
+                   len += snprintf((char *) &ret[len], max - len, ":%d", uri->port);
-               len += snprintf((char *) &ret[len], max - len, ":%d", uri->port);
        } else if (uri->authority != NULL) {
            if (len + 3 >= max) {

Daniel Veillard      | Open Source and Standards, Red Hat
veillard redhat com  | libxml Gnome XML XSLT toolkit | virtualization library

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]