Re: [xml] patch: Functions to parse and create URI query strings



On Wed, Apr 25, 2007 at 03:40:04PM +0100, Richard W.M. Jones wrote:
Daniel Veillard wrote:
On Wed, Apr 25, 2007 at 11:19:36AM +0100, Richard W.M. Jones wrote:
OK, so I'll rework to integrate this into the normal parsing and saving 
of URIs and put the results in the URI structure.  (Is that right?)

 yes.

uri->query really must be deprecated though!

There's a real problem with this ...

When the URI's query string is parsed, xmlParseURIQuery unescapes the 
query string.  Unfortunately this means that application/ 
x-www-form-urlencoded data cannot be decoded as per RFC 2396.  Allow me 
to explain further ...

Consider this test program:

  #include <stdio.h>
  #include <libxml/uri.h>

  int
  main ()
  {
    char *str = "/?field1=%26&field2=%26";
    xmlURIPtr uri;

    uri = xmlParseURI (str);
    if (uri == NULL) { printf ("xmlParseURI returned NULL\n"); exit (1); }

    printf ("query = %s\n", uri->query);

    return 0;
  }

This prints:

  $ ./test
  query = field1=&&field2=&
[...]
So we can certainly proceed with parsing into pairs _if_ we either 
assume that we'll always do application/x-www-form-urlencoded encoding, 
and that the charset of the strings that come out is whatever charset 
the higher layers are expecting (they should know).

Or can we add some extra flags/fields into xmlURIPtr so that the 
encoding at least can be fed into xmlParseURIReference?

Or should we just add uri->query_raw and "deprecate" (ie. tell people to 
use with caution) uri->query?

  Okay, so we need uri->query_raw to be added, fine.
W.r.t. encoding, if the URI comes from the application no way we can guess
if the URI comes from an XML chunk (e.g. an attribute value) then it should
be UTF-8. Anyway we don't need to interpret characters outside of ASCII at
that level (and if the encoding of that string is not compatible with
the ASCII range all bets are off anyway). So I son't think we need to do
anything here: encoding wise we don't need to understand the string
except that if the upper bit of a character is 0 we must assume it's the
ascii value.

  So fine by me to add a query_raw field and an explanation in the structure
(since it's public), and since we have to take the risk of augmenting the
xmlURI size, then let's add the interpreted array of the queries value
if there is any.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]