Re: [xml] spaces in uri, again
- From: Daniel Veillard <veillard redhat com>
- To: Rob Richards <rrichards ctindustries net>
- Cc: xml gnome org, PaweÅ PaÅucha <pawel praterm com pl>
- Subject: Re: [xml] spaces in uri, again
- Date: Sun, 7 Aug 2005 10:22:40 -0400
On Sun, Aug 07, 2005 at 09:36:17AM -0400, Rob Richards wrote:
I see incoming URIs in both forms depending upon the scheme used. Though
this particular issue applies indirectly in my case as the xml I/O
callbacks are not used and the URI is handled within the app with
custom callbacks. In the callbacks themselves the escaping done within
libxml2 needs to be handled and my take on it is that if
xmlParseURIReference on the passed URI comes back with a scheme then its
already escaped correctly otherwise it needs to be unescaped.
I spent quite a bit of time thinking about it, debugging and building tests
Basically the version in CVS fixes things up, and there is also regression
tests integrated in runtest now. The first string is the values of string
passed from the API for example when using xmlReadFile, the second is the
value hitting the I/O layer i.e. what nanohttp or the file handler will receive
"urip://example.com/a b.html",
/* it is an URI the strings must be escaped */
"urip://example.com/a%20b.html",
"urip://example.com/a%20b.html",
/* check that % escaping is not broken */
"urip://example.com/a%20b.html",
"file:///path/to/a b.html",
/* it's an URI path the strings must be escaped */
"file:///path/to/a%20b.html",
"file:///path/to/a%20b.html",
/* check that % escaping is not broken */
"file:///path/to/a%20b.html",
"/path/to/a b.html",
/* this is not an URI, this is a path, so this should not be escaped */
"/path/to/a b.html",
"/path/to/a%20b.html",
/* check that paths with % are not broken */
"/path/to/a%20b.html",
"urip://example.com/résumé.html",
/* out of context the encoding can't be guessed byte by byte conversion */
"urip://example.com/r%E9sum%E9.html",
"urip://example.com/test?a=1&b=2%263&c=4#foo",
/* verify we don't destroy URIs especially the query part */
"urip://example.com/test?a=1&b=2%263&c=4#foo",
This is extracted from runtest.c urip_testURLs and urip_rcvsURLs.
It think this check basic right behaviour for a number of cases we were
usually breaking :-\ . In the process I also fixed #306861 , but
I didn't changed the Win32 specific part of xmlCanonicPath() at the
end of uri.c, someone with windows need to look at this and bring back a
patch for cleaning up that part too.
Note: In my case filesystem URIs need to be unescaped within the
callback so those are handled different and a scheme of "file" gets
unescaped as well.
Well, I think we can expect %20 in a file:// URL to be escaped. If it
is not flagged as such it's a grey area, the relative path could be interpreted
both ways . It is then context related, if out of context it should not be
escaped, if it was an URI-Reference say in an XInclude it should be unescaped
because we know it's an URI-Reference per the embedding spec and not a file
path. This is quite hairy to get right.
Daniel
--
Daniel Veillard | Red Hat Desktop team http://redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
- References:
- Re: [xml] spaces in uri, again
- Re: [xml] spaces in uri, again
- From: =?ISO-8859-2?Q?Pawe=B3_Pa=B3ucha?=
- Re: [xml] spaces in uri, again
- Re: [xml] spaces in uri, again
- From: =?ISO-8859-2?Q?Pawe=B3_Pa=B3ucha?=
- Re: [xml] spaces in uri, again
- Re: [xml] spaces in uri, again
- From: =?ISO-8859-2?Q?Pawe=B3_Pa=B3ucha?=
- Re: [xml] spaces in uri, again
- From: =?ISO-8859-2?Q?Pawe=B3_Pa=B3ucha?=
- Re: [xml] spaces in uri, again
- Re: [xml] spaces in uri, again
- From: =?UTF-8?B?UGF3ZcWCIFBhxYJ1Y2hh?=
- Re: [xml] spaces in uri, again
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]