[xml] File names vs URIs on the command line



Hi,

It appears that while libxml2 internals expect entity loading to be
using "proper" URIs, CLI tools don't do the necessary conversion between
the real path and URIs, which can lead to unexpected results.

See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=516916 for an
example of a failure when the path of the file baing parsed contains
a space.

The interesting thing about that example is that while:
xmllint --noent "/tmp/foo bar/book.xml" fails,
xmllint --noent "/tmp/foo%20bar/book.xml" succeeds.

Even more interesting: if the real directory is "/tmp/foo%20bar" instead
of "/tmp/foo bar", the latter works too.

And if both directories exist, xmllint --noent "/tmp/foo%20bar/book.xml"
will read data from "/tmp/foo%20bar", not "/tmp/foo bar".

This is quite inconsistent.

Should the CLI tools use xmlEscapeURI() on file paths before giving
them to the API ? Or should xmlCreateFileParserCtxt use xmlEscapeURI
before passing the file name to xmlCreateURLParserCtxt ?

Mike



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]