Re: [xml] File names vs URIs on the command line

On Sun, Mar 01, 2009 at 01:24:18PM +0100, Mike Hommey wrote:

It appears that while libxml2 internals expect entity loading to be
using "proper" URIs, CLI tools don't do the necessary conversion between
the real path and URIs, which can lead to unexpected results.

See for an
example of a failure when the path of the file baing parsed contains
a space.

The interesting thing about that example is that while:
xmllint --noent "/tmp/foo bar/book.xml" fails,
xmllint --noent "/tmp/foo%20bar/book.xml" succeeds.

Even more interesting: if the real directory is "/tmp/foo%20bar" instead
of "/tmp/foo bar", the latter works too.

And if both directories exist, xmllint --noent "/tmp/foo%20bar/book.xml"
will read data from "/tmp/foo%20bar", not "/tmp/foo bar".

This is quite inconsistent.

Should the CLI tools use xmlEscapeURI() on file paths before giving
them to the API ? Or should xmlCreateFileParserCtxt use xmlEscapeURI
before passing the file name to xmlCreateURLParserCtxt ?

  Yeah, it's a bit nasty ... basically internally libxml2 tries various
options in case of failure, because of legacy behaviour and the
difficulties to find out exactly when and where to escape :-(
  Actually most of the high level APIs worked with file paths and then
were later extended to allow URIs, you can use
    xmllint http://..../foo.xml
which is often very useful, but also take file paths.
  Another big big nastyness is that we obviously need to be able to work
with relative file paths but there is no proper way to define them in
term of URI, especially if you consider Windows paths absolute because
starting with \ but relative to the volume the file:// scheme is
seriously undefined and this has finished to make this area a total

  If you can turn the filename you got into a full URI, then yes it can
be a good idea to do it to avoid any hazard, but unfortunately it's not
always possible,


Daniel Veillard      | libxml Gnome XML XSLT toolkit
daniel veillard com  | Rpmfind RPM search engine | virtualization library

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]