Re: [xml] patch: Functions to parse and create URI query strings



On Wed, Apr 25, 2007 at 11:19:36AM +0100, Richard W.M. Jones wrote:
Daniel Veillard wrote:
The current uri->query field is always unescaped during parsing.  I have 
changed so it always stored in its raw form.  This because otherwise 
it's impossible to parse query strings such as: 
file:///tmp/test.html?test=%26&second=%26 which can be generated by web 
browsers.  If anyone was relying on the current semantics, then it seems 
to me that they cannot parse such query strings correctly.

 Aside from the number of new APIs, available there, that's my main
issue with the patch. You are changing the default behaviour of a
functionality exposed like forever.
 I guess I would really prefer an approach which hooked into the 
URI parsing itself and filled in an extra list of values (or rather
an array of xmlChar *, alternatively name and values) in the xmlURI
structure at the end. That would allow to keep the uri->query data
as it was, and still provide the functionalities you suggest, based
on a preparsed xmlURIPtr. This would also avoid adding an extra list
type.

OK, so I'll rework to integrate this into the normal parsing and saving 
of URIs and put the results in the URI structure.  (Is that right?)

  yes.

uri->query really must be deprecated though!

  there is basically no deprecation in libxml2 possible, and I have
no plan so far for libxml3, so ...

I'm not sure about the ignore flag in that list, what it is 
used for ?

So this is really useful in the situation that I'm actually using this 
for: I want to parse the URI, remove some of the parameters (basically 
the ones which my code understands) and leave the rest of the parameters 
in place for another piece of code down the line to use.

Now, removing a parameter from a linked list is annoyingly complicated, 
but setting a flag to say "ignore this parameter - I've seen it" is a 
lot easier.  On the other hand, if the complexity is hidden inside a 
uri.c function then that doesn't matter (so long as it works :-)

  okay, 

 - those simplified API would work immediately with the Python generator
   which would not find char ** which can't be handled automatically.

So the use of char ** to return values is there because I couldn't see a 
good way to return an error indication.

As an example, if xmlURIQueryGetSingle were defined as:

  char *xmlURIQueryGetSingle (xmlURIPtr uri, const char *name);

then returning NULL might mean either (1) there is no field with that 
name, or (2) there was an error, eg. in memory allocation.

  No memory allocation error should occur at that stage, you will return
a const char * coming back from within the xmlURIPtr array. The life time
of that value will be the same as the xmlURI, which is IMHO a fine way
to do things.

About the use of char vs xmlChar: I could really see which one was 
correct.  I understand that xmlChar is unsigned because of some bogosity 
in the XML spec, so xmlChar is used for characters in XML documents. 
URIs are different though, so which should I be using?

  xmlChar is unsigned because of the bogosity of C strings which have
no associated encoding (and no the current locale is not a decent answer
for XML processing). xmlChar * means an UTF-8 encoded string. char * means
"we don't know the encoding" basically. See the URI vs. IRI disaster
(sorry I don't have the IRI RFC number offhand), I assume we should stick
to char * (or rather const char *) for all of those APIs.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]