Re: [xml] xmlReadFile with filename instead of URL?



The behaviour only seems to trigger when you configure --without-zlib. I don't know why yet, but there are zlib specific #ifdefs in the loading and URL mangling code, so there could be something funny going on that isn't triggered when zlib is disabled.

Okay, I found the bug, it's very simple.

Here is the comment for xmlFileOpen:

 * Wrapper around xmlFileOpen_real that try it with an unescaped
 * version of @filename, if this fails fallback to @filename

However, the code does *not* do this:

    unescaped = xmlURIUnescapeString(filename, 0, NULL);
    if (unescaped != NULL) {
        retval = xmlFileOpen_real(unescaped);
        xmlFree(unescaped);
    } else {
        retval = xmlFileOpen_real(filename);
    }
    return retval;

The code is unescaping the filename first and trying to load that. If it fails, then it fails. Shouldn't it try and load the filename as-is first, and if *that* fails try unescaping it? Or better yet, not try unescaping it all, I mean since when did filenames use % escapes anyway?

So I suggest this patch to xmlFileOpen in xmlIO.c:

    retval = xmlFileOpen_real(filename);
    if (retval == NULL) {
        unescaped = xmlURIUnescapeString(filename, 0, NULL);
        if (unescaped != NULL) {
            retval = xmlFileOpen_real(unescaped);
            xmlFree(unescaped);
        }
    }
    return retval;

With this code the file "hello%2Fworld.xml" will be loaded first, and only if it is not found will "hello/world.xml" be loaded. But yeah, I would rather delete that entire if test, as it seems to me that any URL unescaping should be handled a lot earlier before xmlFileOpen sees it.

Michael

--
Print XML with Prince!
http://www.princexml.com



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]