Re: [xml] spaces in uri, again



On Sun, Aug 07, 2005 at 09:36:17AM -0400, Rob Richards wrote:
I see incoming URIs in both forms depending upon the scheme used. Though 
this particular issue applies indirectly in my case as the xml I/O 
callbacks are not used and the URI is  handled within the app with 
custom callbacks. In the callbacks themselves the escaping done within 
libxml2 needs to be handled and my take on it is that if 
xmlParseURIReference on the passed URI comes back with a scheme then its 
already escaped correctly otherwise it needs to be unescaped.

  I spent quite a bit of time thinking about it, debugging and building tests
Basically the version in CVS fixes things up, and there is also regression
tests integrated in runtest now. The first string is the values of string
passed from the API for example when using xmlReadFile, the second is the
value hitting the I/O layer i.e. what nanohttp or the file handler will receive

    "urip://example.com/a b.html",
    /* it is an URI the strings must be escaped */
    "urip://example.com/a%20b.html",

    "urip://example.com/a%20b.html",
    /* check that % escaping is not broken */
    "urip://example.com/a%20b.html",

    "file:///path/to/a b.html",
    /* it's an URI path the strings must be escaped */
    "file:///path/to/a%20b.html",

    "file:///path/to/a%20b.html",
    /* check that % escaping is not broken */
    "file:///path/to/a%20b.html",

    "/path/to/a b.html",
    /* this is not an URI, this is a path, so this should not be escaped */
    "/path/to/a b.html",

    "/path/to/a%20b.html",
    /* check that paths with % are not broken */
    "/path/to/a%20b.html",

    "urip://example.com/résumé.html",
    /* out of context the encoding can't be guessed byte by byte conversion */
    "urip://example.com/r%E9sum%E9.html",

    "urip://example.com/test?a=1&b=2%263&c=4#foo",
    /* verify we don't destroy URIs especially the query part */
    "urip://example.com/test?a=1&b=2%263&c=4#foo",

  This is extracted from runtest.c urip_testURLs and urip_rcvsURLs.
It think this check basic right behaviour for a number of cases we were 
usually breaking :-\ . In the process I also fixed #306861 , but
I didn't changed the Win32 specific part of xmlCanonicPath() at the
end of uri.c, someone with windows need to look at this and bring back a
patch for cleaning up that part too.

Note: In my case filesystem URIs need to be unescaped within the 
callback so those are handled different and a scheme of "file" gets 
unescaped as well.

  Well, I think we can expect %20 in a file:// URL to be escaped. If it
is not flagged as such it's a grey area, the relative path could be interpreted
both ways . It is then context related, if out of context it should not be
escaped, if it was an URI-Reference say in an XInclude it should be unescaped
because we know it's an URI-Reference per the embedding spec and not a file
path. This is quite hairy to get right.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]