Re: [xml] libxml2 uri.c xmlURIEscape (with fix)
- From: Joel Young <jdy cs brown edu>
- To: veillard redhat com
- Cc: jdy cs brown edu, xml gnome org
- Subject: Re: [xml] libxml2 uri.c xmlURIEscape (with fix)
- Date: Thu, 08 Aug 2002 15:41:40 -0400
Daniel,
I am confused a little bit by this URL business. What encoding or
character set should a URL be in to be fed to xmlURIEscape? If the
unescaped URL has any characters that aren't specifically mentioned in
the RFC as delimiting the portion of the URL then I think they should be
escaped. For example "ñ" in a path should be escaped by xmlURIEscape.
Right now it doesn't do this,
adding lines like:
while (IS_PCHAR(cur) || ((uri->cleanup) && (IS_UNWISE(cur) || ((*cur) <0))))
to uri.c help, but I am sure I don't understand the implications.
Reading the RFC 2396, section 2.4.3 seems to give some hints:
control, space, delims, and unwise should be allowed to be escaped as
well as anything greater than hex 7F (the last doesn't seem to be
mentioned).
I guess what I am looking for is a set of heuristics for dealing with
arbitrary URLs gathered from the wild and converting them with high
probability to correct form.
Joel
--------
From: Daniel Veillard <veillard redhat com>
Date: Wed, 31 Jul 2002 03:18:06 -0400
To: Joel Young <jdy cs brown edu>
Cc: xml gnome org
Subj: Re: [xml] libxml2 uri.c xmlURIEscape (with fix)
On Tue, Jul 30, 2002 at 08:34:50PM -0400, Joel Young wrote:
Hi Daniel,
I found another issue with xmlURIEscape. It doesn't handle blanks in
the input string. I know blanks aren't valid but that's what
xmlURIEscape is s'posed to fix.
All that is needed to fix this is to add ' ' to IS_UNWISE in uri.c:
#define IS_UNWISE(p) \
(((*(p) == '{')) || ((*(p) == '}')) || ((*(p) == '|')) || \
((*(p) == '\\')) || ((*(p) == '^')) || ((*(p) == '[')) || \
((*(p) == ']')) || ((*(p) == ' ')) || ((*(p) == '`')))
------------
What do you think?
That UNWISE set is defined by RFC 2396 and I dislike the idea of
changing something that fundamental. This means it also change the
semantic of URI checking when parsing new ones, and I'm not fond of that.
I would rather prefer a patch checking uri->cleanup == 1 to allow them
at parse time only when trying to do escaping.
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]