[xml] Change request for xmlBuildURI().
- From: "Zwaal,Martin" <Martin Zwaal oclc org>
- To: "xml gnome org" <xml gnome org>
- Subject: [xml] Change request for xmlBuildURI().
- Date: Thu, 23 Jul 2015 10:55:00 +0000
Hello libxml2 maintainers,
Short version, note the xmlURI struct from uri.h, libxml2 version 2.9.2:
/**
* xmlURI:
*
* A parsed URI reference. This is a struct containing the various fields
* as described in RFC 2396 but separated for further processing.
*
* Note: query is a deprecated field which is incorrectly unescaped.
* query_raw takes precedence over query if the former is set.
* See: http://mail.gnome.org/archives/xml/2007-April/thread.html#00127
*/
typedef struct _xmlURI xmlURI;
typedef xmlURI *xmlURIPtr;
struct _xmlURI {
char *scheme; /* the URI scheme */
char *opaque; /* opaque part */
char *authority; /* the authority part */
char *server; /* the server part */
char *user; /* the user part */
int port; /* the port number */
char *path; /* the path string */
char *query; /* the query string (deprecated - use with caution) */
char *fragment; /* the fragment identifier */
int cleanup; /* parsing potentially unclean URI */
char *query_raw; /* the query string (as it appears in the URI) */
};
Next to 'query_raw' it would be useful to have 'server_raw', 'user_raw', 'path_raw' and 'fragment_raw' that
take precedence over the existing struct members.
===
Long version:
We use libxml2/libxslt for serverside xslt processing of browser pages. To allow xslt stylesheets from other
domains we use a proxy that is supplied with the original url in encoded form. An example (demo) is this:
<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl"
href="/get/BASE=http%3A%2F%2Fservername%3A80%2F%7Eaccountname%2Fdirectory/%3Fid%3DSCREEN_ID%26name%3Dvalue"?>
This external stylesheet is loaded using xsltLoadStylesheetPI(). Beforehand we've called xsltSetLoaderFunc()
to have control over the documents that are loaded during the transformation, which are the stylesheet itself
as well as sub-documents.The problem is that the function set by xsltSetLoaderFunc() gets mangled urls. E.g.
the above url is transformed to:
http://<ip-address>:<port>/get/BASE=http%3A//servername%3A80/~accountname/directory/%3Fid=SCREEN_ID&name=value
This cannot be repaired outside the library because we cannot not know what parts to url-encode to get back
the original url. Note that in this example "%3A" and "%3F" are still intact. Url-encoding the whole string
would result in double encoding of these parts. It would also encode all forward slashes '/' instead of only
those that were decoded from "%2F".
A closer look reveals what goes wrong. xmlBuildURI() indirectly calls xmlURIUnescapeString() which
url-decodes all percent-encoded entities and finally xmlSaveUri() constructs the above output string while
url-encoding special characters ':' and '?', but not characters like '/' and '&'. Imho, a better approach
would be to skip decoding/encoding entirely and use raw parts that are glued together before handing them
over to the outside. If you look at this function:
/**
* xmlParse3986URI:
* @uri: pointer to an URI structure
* @str: the string to analyze
*
* Parse an URI string and fills in the appropriate fields
* of the @uri structure
*
* scheme ":" hier-part [ "?" query ] [ "#" fragment ]
*
* Returns 0 or the error code
*/
then it would make sense to divide the input by ":", "?" and "#" and save all parts in raw format. When
constructing a url, xmlSaveUri() can simply glue all parts together with ":", "?" and "#" in between. But I
only see query_raw stored in the xmlURI struct. What about the other struct members that got their value
through xmlURIUnescapeString()?
Kind regards,
Martin Zwaal
OCLC B.V. ยท Software Engineer
Schipholweg 99, P.O. Box 876 2300 AW Leiden The Netherlands
T +31 (0)71 524 678
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]