Re: [xml] XPointer behaviour in LibXML2
- From: Daniel Veillard <veillard redhat com>
- To: Johann Richard <Johann richard dspfactory ch>
- Cc: xml gnome org
- Subject: Re: [xml] XPointer behaviour in LibXML2
- Date: Thu, 28 Nov 2002 06:10:29 -0500
On Thu, Nov 28, 2002 at 11:17:30AM +0100, Johann Richard wrote:
Daniel,
Hum , do you mean that using #xpointer(/section/section[3]) works, or that
it doesn't work.
Sorry about that one:
#xpointer(/section/section[3]) shows exactly the same behaviour like #xpointer(id('section3)):
They both work for local files and URL's, but not for the URN.
The DTD itself is found, it's located via its PublicID in both the including and included file, that should
not be a problem neither.
The bizzare thing for me is really that the URN gets correctly resolved to the URL but that somehow, the
fragment seems to get lost ...
I was able to reproduce the problem locally.
paphio:~/XML -> cat xinc.xml
<x xmlns:xinclude="http://www.w3.org/2001/XInclude">
<xinclude:include href="urn:toto#xpointer(//p[2])"/>
</x>
(gdb) r
Starting program: /u/veillard/XML/xmllint --xinclude xinc.xml
Breakpoint 1, xmlXIncludeIncludeNode (ctxt=0x80e7fd0, nr=0) at xinclude.c:1572
1572 if (ctxt == NULL)
(gdb) p *ctxt->incTab[nr]
$2 = {URI = 0x80e3f00 "urn:toto", fragment = 0x0, doc = 0x0, ref = 0x80f3330,
inc = 0x80f5f40, xml = 1, count = 1}
there is no fragment ID found.
I note the definition for a fragment ID in rfc2396
http://www.ietf.org/rfc/rfc2396.txt
seems only defined for URI references, see section "4.1. Fragment Identifier"
section 3. URI Syntactic Components says
-------------------------------------------
The URI syntax does not require that the scheme-specific-part have
any general structure or set of semantics which is common among all
URI. However, a subset of URI do share a common syntax for
representing hierarchical relationships within the namespace. This
"generic URI" syntax consists of a sequence of four main components:
<scheme>://<authority><path>?<query>
each of which, except <scheme>, may be absent from a particular URI.
For example, some URI schemes do not allow an <authority> component,
and others do not use a <query> component.
absoluteURI = scheme ":" ( hier_part | opaque_part )
URI that are hierarchical in nature use the slash "/" character for
separating hierarchical components. For some file systems, a "/"
character (used to denote the hierarchical structure of a URI) is the
delimiter used to construct a file name hierarchy, and thus the URI
path will look similar to a file pathname. This does NOT imply that
the resource is a file or that the URI maps to an actual filesystem
pathname.
hier_part = ( net_path | abs_path ) [ "?" query ]
net_path = "//" authority [ abs_path ]
abs_path = "/" path_segments
URI that do not make use of the slash "/" character for separating
hierarchical components are considered opaque by the generic URI
parser.
opaque_part = uric_no_slash *uric
uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |
"&" | "=" | "+" | "$" | ","
We use the term <path> to refer to both the <abs_path> and
<opaque_part> constructs, since they are mutually exclusive for any
given URI and can be parsed as a single component.
-------------------------------------------
an URN has the reserved "urn" scheme. it does not use / as the separator.
As a result libxml2 URI parsing module applies
absoluteURI = scheme ":" opaque_part
Seems to me that the fragment identifier is not defined for URNs.
If I replace "urn:toto" with "http://example.com/toto" both in the instance
and in the catalog:
---------------------------------------------
paphio:~/XML -> cat xinc.xml
<x xmlns:xinclude="http://www.w3.org/2001/XInclude">
<xinclude:include href="http://example.com/toto#xpointer(//p[2])"/>
</x>
paphio:~/XML -> cat tst.catal
<!DOCTYPE catalog
PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
"http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"
prefer="public">
<system systemId="http://example.com/toto" uri="toto.xml"/>
</catalog>
paphio:~/XML -> cat toto.xml
<doc><p>p1</p><p>p2</p></doc>
paphio:~/XML -> ./xmllint --xinclude xinc.xml
<?xml version="1.0"?>
<x xmlns:xinclude="http://www.w3.org/2001/XInclude">
<p xml:base="http://example.com/toto">p2</p>
</x>
paphio:~/XML ->
---------------------------------------------
So this seems a limitation of the URN mechanism. Whether a fragment
identifier syntax exists for those or not is unclear, apparently no based
on RFC 2396.
Looking at RFC 2141
http://www.ietf.org/rfc/rfc2141.txt
------------------
2. Syntax
All URNs have the following syntax (phrases enclosed in quotes are
REQUIRED):
<URN> ::= "urn:" <NID> ":" <NSS>
------------------
and
-------------------
2.3.2 The other reserved characters
RFC 1630 [2] reserves the characters "/", "?", and "#" for particular
purposes. The URN-WG has not yet debated the applicability and
precise semantics of those purposes as applied to URNs. Therefore,
these characters are RESERVED for future developments. Namespace
developers SHOULD NOT use these characters in unencoded form, but
rather use the appropriate %-encoding for each character.
-------------------
seems to confirm libxml2 behaviour, i.e. #foo for an URN should not
be used and has no predefined meaning. Maybe there is an update, but
currently it seems libxml2 behaviour is not buggy,
Again my take is don't use URN in a web framework...
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]