Re: [xml] spaces in uri, again



On Thu, Aug 04, 2005 at 12:42:42PM +0200, PaweÅ PaÅucha wrote:

  Maybe the "I'm offended" attitude about you hitting a bug is one of the 
thing your original mail was wrong and need fixing.
  Giving the version number of libxml2 used, did it work in the past but failed
in a recent version, investigating the code, basically trying to present things
in a way that make it looks more fun than just an angry user would help.

I'm not ofended or angry, sorry if my mail looked liked that.

 okidoc, maybe I'm just a bit stressed or something :-)

I'm using the
latest version, 2.6.20, of course, on Linux. The same is with 2.6.18 (I'm
using it on another machine).

  yeah, and I could confirm this with CVS head.
I checked with a very old version too:
  veillard:~ -> xmllint --version
 xmllint: using libxml version 20510
     compiled with: FTP HTTP HTML C14N Catalog DocBook XPath XPointer XInclude Iconv Unicode Regexps Automata 
Schemas
  veillard:~ -> xmllint 'http://localhost/escape space.xml'
  warning: failed to load external entity "http%3A//localhost/escape%20space.xml"
  veillard:~ -> xmllint 'http://localhost/escape%20space.xml'
  <?xml version="1.0"?>
  <doc>it works</doc>
  veillard:~ ->

i.e. if the unescaped string is passed it break, if the string was escaped it
used to work, it doesn't work now, I also see
  "GET /escape space.xml HTTP/1.0" 404
in my logs, this is plain wrong.

The case xmllint 'http://localhost/a%20b' is easier - it looks like when
parsing uri string to xmlUri struct all fields are unescaped (using
xmlURIUnescapeString) and then passed without modification to nanohttp
context, and then used without modification to create HTTP query. What is the
reason to unescape uri fields? And does an argument for xmlParseFile (for
example) should be already escaped?

  I would expect the argument to be passed unescaped. Escaping should
normally be a no-op on an already escaped string, the escaping of the
: after the protocol is completely bogus, I dunno how it happened but it's
completely wrong. I think this need to be reexamined for all case, 
and escaping should understand URI semantic, 
  I think the safest is to to not change nanohttp, but at the xmlIO level
try to parse the URI if the file doesn't exist, then do a correct escaping
if this is an absolute URI and pass that to the protocol handlers.

In meantime I'll tty to find out why 'http://alpha/a b' is escaped to
'http%3A//alpha/a%20b'.

  probably a fallback when trying to load the unescaped version.
Testing will also require to check URI references in XSLT, I think there
is a tests in libxslt about it. If it breaks that mean the solution won't
work for most users.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]