Re: [xml] spaces in uri, again



On Thu, Aug 04, 2005 at 02:16:58PM +0200, PaweÅ PaÅucha wrote:

And does an argument for xmlParseFile (for
example) should be already escaped? 

  I would expect the argument to be passed unescaped. Escaping should
normally be a no-op on an already escaped string, the escaping of the
: after the protocol is completely bogus, I dunno how it happened but it's
completely wrong. I think this need to be reexamined for all case, 
and escaping should understand URI semantic, 

The escaping is done in xmlCanonicPath - building an uri from
'http://alpha/a b' fails because xmlParseUri expects _escaped_ string and
fails on space after 'a'. Next, all the string is treated as 'path' part of uri:

(uri.c, line 2268):
uri->path = (char *) xmlStrdup((const xmlChar *) path);
...
ret = xmlSaveUri(uri);
...
return(ret);

   We can't change xmlParseUri, clearly it's defined as parsing per the
rfc2396 syntax. There is 2 ways around this:
   - make a new entry point allowing to parse unescaped strings
   - find a way to correctly unescape unescaped strings even if they
     use spaces or other chars.

xmlSaveUri escapes characters that should't be in 'path' segment, including
':' and returns 'http%3A//alpha/a%20b'.

  xmlSaveUri is right, the problem is that the http should nver end in
the path part.

If escaping should be done by library (and not user calling xmlParseFile)
perhaps the easiest way to fix it is to modify xmlCanonicPath/xmlParseUri.

  We can play with xmlCanonicPath, not with xmlParseUri.
We could detect {protocol}:// and then force an escaping of the 
remaining chars like ' ' '\n' '\r' '\t' and then pass to xmlParseUri.

And escaping of already escaped string is not a no-op operation beacuse of
possible recursive escaping of '%' characters.

  yes it should be, %xy with x and y being numbers should not be modified by
any URI escaping routing, that would be another bug.

I also looked at xslt code - in transform.c there's an attempt to build URI
from string and then, when failed, it is repeated with escaped string. Perhaps
the same thing could be done in xmlCanonicPath() ?

  the problem is that the escaping may turn http:// into http%3A// again
that sounds wrong, we should fix that part once and for good, checking all
the different cases.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]