Re: [xml] libxml2 uri.c xmlURIEscape (with fix)




Daniel says:
That UNWISE set is defined by RFC 2396 and I dislike the idea of 
changing something that fundamental. This means it also change the
semantic of URI checking when parsing new ones, and I'm not fond of that.
I would rather prefer a patch checking uri->cleanup == 1  to allow them
at parse time only when trying to do escaping.

Ok, however in URI.c, IS_UNWISE is used exclusively in conjunction with
uri->cleanup, e.g.:

 1389   while (IS_PCHAR(cur) || ((uri->cleanup) && (IS_UNWISE(cur))))
 1390       NEXT(cur);
 1391   while (*cur == ';') {
 1392       cur++;
 1393       while (IS_PCHAR(cur) || ((uri->cleanup) && (IS_UNWISE(cur))))
 1394           NEXT(cur);
 1395   }


So, at least as the code currently stands, adding ' ' to IS_UNWISE does
exactly that.  Is IS_UNWISE being abused right now?  It isn't used at
all otherwise.  

Perhaps there should be a new one:

#define IS_INNEEDOFESCAPING(p) ((IS_UNWISE(p)) || ((*(p) == ' ')))

and every current use of IS_UNWISE should be changed to 
IS_INNEEDOFESCAPING?

I don't know...maybe this needs to be thought about more
carefully...where are escape coded spaces allowed in a URL?  perhaps the
' ' should just be added to those checks with the url->cleanup flag.
Are there other characters than the ' ' that we are forgetting about
now?

Joel
--------
From: Daniel Veillard <veillard redhat com>
Date: Wed, 31 Jul 2002 03:18:06 -0400
  To: Joel Young <jdy cs brown edu>
  Cc: xml gnome org
Subj: Re: [xml] libxml2 uri.c xmlURIEscape (with fix)

On Tue, Jul 30, 2002 at 08:34:50PM -0400, Joel Young wrote:

Hi Daniel,

I found another issue with xmlURIEscape.  It doesn't handle blanks in
the input string.  I know blanks aren't valid but that's what
xmlURIEscape is s'posed to fix.

All that is needed to fix this is to add ' ' to IS_UNWISE in uri.c:

 #define IS_UNWISE(p)                                                    \
       (((*(p) == '{')) || ((*(p) == '}')) || ((*(p) == '|')) ||         \
       ((*(p) == '\\')) || ((*(p) == '^')) || ((*(p) == '[')) ||        \
        ((*(p) == ']')) || ((*(p) == ' ')) || ((*(p) == '`')))  
 
                             ------------

What do you think?

That UNWISE set is defined by RFC 2396 and I dislike the idea of 
changing something that fundamental. This means it also change the
semantic of URI checking when parsing new ones, and I'm not fond of that.
I would rather prefer a patch checking uri->cleanup == 1  to allow them
at parse time only when trying to do escaping.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]