Re: [xml] XPointer behaviour in LibXML2



On Thu, Nov 28, 2002 at 11:17:30AM +0100, Johann Richard wrote:
Daniel,

Hum , do you mean that using #xpointer(/section/section[3]) works, or that
it doesn't work.

Sorry about that one: 

#xpointer(/section/section[3]) shows exactly the same behaviour like #xpointer(id('section3)):

They both work for local files and URL's, but not for the URN. 

The DTD itself is found, it's located via its PublicID in both the including and included file, that should 
not be a problem neither.

The bizzare thing for me is really that the URN gets correctly resolved to the URL but that somehow, the 
fragment seems to get lost ...

  I was able to reproduce the problem locally.

paphio:~/XML -> cat xinc.xml
<x xmlns:xinclude="http://www.w3.org/2001/XInclude";>
   <xinclude:include href="urn:toto#xpointer(//p[2])"/>
</x>
(gdb) r
Starting program: /u/veillard/XML/xmllint --xinclude xinc.xml

Breakpoint 1, xmlXIncludeIncludeNode (ctxt=0x80e7fd0, nr=0) at xinclude.c:1572
1572        if (ctxt == NULL)
(gdb) p *ctxt->incTab[nr]
$2 = {URI = 0x80e3f00 "urn:toto", fragment = 0x0, doc = 0x0, ref = 0x80f3330,
  inc = 0x80f5f40, xml = 1, count = 1}

  there is no fragment ID found.
I note the definition for a fragment ID in rfc2396 
    http://www.ietf.org/rfc/rfc2396.txt
seems only defined for URI references, see section "4.1. Fragment Identifier"

section 3. URI Syntactic Components says

-------------------------------------------
   The URI syntax does not require that the scheme-specific-part have
   any general structure or set of semantics which is common among all
   URI.  However, a subset of URI do share a common syntax for
   representing hierarchical relationships within the namespace.  This
   "generic URI" syntax consists of a sequence of four main components:

      <scheme>://<authority><path>?<query>

   each of which, except <scheme>, may be absent from a particular URI.
   For example, some URI schemes do not allow an <authority> component,
   and others do not use a <query> component.

      absoluteURI   = scheme ":" ( hier_part | opaque_part )

   URI that are hierarchical in nature use the slash "/" character for
   separating hierarchical components.  For some file systems, a "/"
   character (used to denote the hierarchical structure of a URI) is the
   delimiter used to construct a file name hierarchy, and thus the URI
   path will look similar to a file pathname.  This does NOT imply that
   the resource is a file or that the URI maps to an actual filesystem
   pathname.

      hier_part     = ( net_path | abs_path ) [ "?" query ]

      net_path      = "//" authority [ abs_path ]

      abs_path      = "/"  path_segments

   URI that do not make use of the slash "/" character for separating
   hierarchical components are considered opaque by the generic URI
   parser.

      opaque_part   = uric_no_slash *uric

      uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |
                      "&" | "=" | "+" | "$" | ","

   We use the term <path> to refer to both the <abs_path> and
   <opaque_part> constructs, since they are mutually exclusive for any
   given URI and can be parsed as a single component.
-------------------------------------------

  an URN has the reserved "urn" scheme. it does not use / as the separator.
As a result libxml2 URI parsing module applies 
   absoluteURI   = scheme ":" opaque_part

  Seems to me that the fragment identifier is not defined for URNs.
If I replace "urn:toto" with "http://example.com/toto"; both in the instance
and in the catalog:

---------------------------------------------
paphio:~/XML -> cat xinc.xml
<x xmlns:xinclude="http://www.w3.org/2001/XInclude";>
   <xinclude:include href="http://example.com/toto#xpointer(//p[2])"/>
</x>
paphio:~/XML -> cat tst.catal
<!DOCTYPE catalog
  PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
         "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd";>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"
         prefer="public">
<system systemId="http://example.com/toto"; uri="toto.xml"/>
</catalog>

paphio:~/XML -> cat toto.xml
<doc><p>p1</p><p>p2</p></doc>
paphio:~/XML -> ./xmllint --xinclude xinc.xml
<?xml version="1.0"?>
<x xmlns:xinclude="http://www.w3.org/2001/XInclude";>
   <p xml:base="http://example.com/toto";>p2</p>
</x>
paphio:~/XML ->
---------------------------------------------

  So this seems a limitation of the URN mechanism. Whether a fragment
identifier syntax exists for those or not is unclear, apparently no based
on RFC 2396.
  Looking at RFC 2141
   http://www.ietf.org/rfc/rfc2141.txt

------------------
2. Syntax

   All URNs have the following syntax (phrases enclosed in quotes are
   REQUIRED):

                     <URN> ::= "urn:" <NID> ":" <NSS>

------------------

  and

-------------------
2.3.2 The other reserved characters

   RFC 1630 [2] reserves the characters "/", "?", and "#" for particular
   purposes. The URN-WG has not yet debated the applicability and
   precise semantics of those purposes as applied to URNs. Therefore,
   these characters are RESERVED for future developments.  Namespace
   developers SHOULD NOT use these characters in unencoded form, but
   rather use the appropriate %-encoding for each character.
-------------------

  seems to confirm libxml2 behaviour, i.e. #foo for an URN should not
be used and has no predefined meaning. Maybe there is an update, but 
currently it seems libxml2 behaviour is not buggy, 

  Again my take is don't use URN in a web framework...

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]