Re: [xml] Problems with "%" characters in the path or filename version 2-2.6.2



On Mon, 24 Nov 2003, Johnson, Cameron wrote:

I have encountered a problem with the XML parser and I would like to
know if there is an easy solution. From a desktop application not a web
based application I open a XML file to parse.  The filename of this
application contains a '%' in the name of the file.  Before you open the
file you pass the filename you use the function xmlURIUnescapeString().
I understand that this character should be removed as '%' represents an
escape character in an URI however if the input is from user input(i.e.
dialogs on MAC and WINDOWS) the character is part of the filename.  At
present I have commented out the calling of this function to get the
library to open the proper file.  I am uncomfortable with this solution
as I am sure that this function is needed in some circumstances.

We first found this problem when we were testing opening files in a
Japanese local using Japanese kanji characters.  Some of these
characters contain '%' as the first byte of a two byte character.

It doesn't matter where the input comes from - the names passed to the
functions to open files are URIs and therefore you have to give them URIs.
As I undestand it every 'file' handling function actually goes through the
registered IO handlers (through xmlRegisterInputCallbacks) and falls back
to the built in methods for providing input.

If the user types in "file:///hello%there" then you can either assume
that that's what they meant (an invalid URI) or escape it yourself. If
they typed in "hello%there" then it's very likely that they meant
"file:///hello%25there".

Removing the unescaping of URIs is breaking the operation of the functions
- they're meant to take URIs and if you're accepting % as a raw character
then they're no longer URIs.

In the quick look that I had to verify this, it make be that the use of
things like 'htmlParseFile' and 'xmlParseFile' (amongst others) may have
given rise to this misunderstanding - the 'filename' parameter is either
a local filename with URI-style escaping (if you use no protocol
specifier) or is passed through to the relevant input handler as a full
URI, obviously with URI-stlye escaping (if you use a protocol specifier).

However that's only from a quick look to refresh my memory on it as it's
been a little while since I looked at the IO handlers, so I could very
well be wrong.

-- 
Gerph
<http://www.movspclr.co.uk/> <http://homepage.ntlworld.com/justin.fletcher/>
... Frightened and feared - answer my question - why me ?



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]