Re: [xml] patch: Functions to parse and create URI query strings
- From: Daniel Veillard <veillard redhat com>
- To: "Richard W.M. Jones" <rjones redhat com>
- Cc: xml gnome org
- Subject: Re: [xml] patch: Functions to parse and create URI query strings
- Date: Wed, 25 Apr 2007 10:59:43 -0400
On Wed, Apr 25, 2007 at 03:40:04PM +0100, Richard W.M. Jones wrote:
Daniel Veillard wrote:
On Wed, Apr 25, 2007 at 11:19:36AM +0100, Richard W.M. Jones wrote:
OK, so I'll rework to integrate this into the normal parsing and saving
of URIs and put the results in the URI structure. (Is that right?)
yes.
uri->query really must be deprecated though!
There's a real problem with this ...
When the URI's query string is parsed, xmlParseURIQuery unescapes the
query string. Unfortunately this means that application/
x-www-form-urlencoded data cannot be decoded as per RFC 2396. Allow me
to explain further ...
Consider this test program:
#include <stdio.h>
#include <libxml/uri.h>
int
main ()
{
char *str = "/?field1=%26&field2=%26";
xmlURIPtr uri;
uri = xmlParseURI (str);
if (uri == NULL) { printf ("xmlParseURI returned NULL\n"); exit (1); }
printf ("query = %s\n", uri->query);
return 0;
}
This prints:
$ ./test
query = field1=&&field2=&
[...]
So we can certainly proceed with parsing into pairs _if_ we either
assume that we'll always do application/x-www-form-urlencoded encoding,
and that the charset of the strings that come out is whatever charset
the higher layers are expecting (they should know).
Or can we add some extra flags/fields into xmlURIPtr so that the
encoding at least can be fed into xmlParseURIReference?
Or should we just add uri->query_raw and "deprecate" (ie. tell people to
use with caution) uri->query?
Okay, so we need uri->query_raw to be added, fine.
W.r.t. encoding, if the URI comes from the application no way we can guess
if the URI comes from an XML chunk (e.g. an attribute value) then it should
be UTF-8. Anyway we don't need to interpret characters outside of ASCII at
that level (and if the encoding of that string is not compatible with
the ASCII range all bets are off anyway). So I son't think we need to do
anything here: encoding wise we don't need to understand the string
except that if the upper bit of a character is 0 we must assume it's the
ascii value.
So fine by me to add a query_raw field and an explanation in the structure
(since it's public), and since we have to take the risk of augmenting the
xmlURI size, then let's add the interpreted array of the queries value
if there is any.
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]