Re: [xslt] XML Catalog search ordering



Hiya,

Apologies for this being a little longwinded, but I want to cover my
investigation and understanding of the problem such that there is no lengthy
exchange of explanation except where clarification is required :-)

In message <e40982bb4a.Justin@gerph.movspclr.co.uk>
          Justin Fletcher <justin.fletcher@ntlworld.com> wrote:

> In message <20010917121225.G22064@redhat.com>
>    on 17 Sep 2001, you wrote:
>
> > On Mon, Sep 17, 2001 at 04:55:36PM +0100, Justin Fletcher wrote:

[snip - descriptiong of the XML catalog problem with rewriteSystem and public
declarations within xsltproc]

> >
> >   Sounds right:
> >
> > orchis:~/XML -> xmlcatalog --shell
> > > add rewriteSystem "http://www.movspclr.co.uk/dtd/" "file:///%3CXMLCatalog$Dir%3E/gerph/"
> > > add public "-//Gerph//DTD PRM documentation 1.00//EN" "http://www.movspclr.co.uk/dtd/prm.dtd"
> > > dump
> > <?xml version="1.0"?>
> > <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
> > <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
> > <rewriteSystem systemIdStartString="http://www.movspclr.co.uk/dtd/" rewritePrefix="file:///%3CXMLCatalog$Dir%3E/gerph/"/>
> > <public publicId="-//Gerph//DTD PRM documentation 1.00//EN" uri="http://www.movspclr.co.uk/dtd/prm.dtd"/>
> > </catalog>
> > > resolve "-//Gerph//DTD PRM documentation 1.00//EN" "http://www.movspclr.co.uk/dtd/prm.dtd"
> > file:///%3CXMLCatalog$Dir%3E/gerph/prm.dtd
> > > exit
> > orchis:~/XML ->
> >
> >   Seems to work here ... strange,
>
> Same behaviour here, too, so my port hasn't broken those bits.
>
> I know the /reason/; I can't say what to do about it.
>
[snip]
> I'm wondering why we try to do this resolution in xsltproc ourselves, and then
> call the defaultLoader (which should always be valid because we always get
> back a entity loader from libxml).

[[[ Terms used:
       NoNetLoader: shortened form of xsltNoNetExternalEntityLoader inside
                    libxslt/xsltproc/xsltproc.c.
       defaultLoader: colloquial name for xmlDefaultExternalEntityLoader
                    inside libxml2/xmlIO.c as returned by
                    xmlGetExternalEntityLoader.

    Brief summary of the following:
       The NoNetLoader is duplicating the work of defaultLoader and /then/
       calling defaultLoader on its results. This is not correct.
       An improved scheme would involve the creation of a new routine,
       dubbed the defaultResolver which would provide the catalog resolution
       capabilities in a central manner.
       A suggestion is made for the API of this scheme.
]]]


I've been looking at this further.

It seems to me that the purpose of the NoNetLoader is purely to prevent
network access. It does this by doing everything that the default loader
does and then denying the resulting URI if it would be network based.

However, if it then calls the default loader on the result of that
processing, it may still end up trying to read a network address.

The reason for this is nearly given in your example above; I just never
thought to follow through.

--8<--------
1 *xmlcatalog --shell
2 > add rewriteSystem "http://www.movspclr.co.uk/dtd/" "file:///%3CXMLCatalog$Dir%3E/gerph/"
3 > add public "-//Gerph//DTD PRM documentation 1.00//EN" "http://www.movspclr.co.uk/dtd/prm.dtd"
4 > resolve "-//Gerph//DTD PRM documentation 1.00//EN" "http://www.movspclr.co.uk/dtd/prm.dtd"
5 file:///%3CXMLCatalog$Dir%3E/gerph/prm.dtd
6 > resolve "-//Gerph//DTD PRM documentation 1.00//EN" "file:///%3CXMLCatalog$Dir%3E/gerph/prm.dtd"
7 http://www.movspclr.co.uk/dtd/prm.dtd
8 > quit
9 *
--8<--------

I have numbered the lines so that I can refer to them. Everything up to and
including 3 is what happens when we read in the XML catalog I am using.
Line 4 is the step that NoNetLoader does by itself, resulting in line 5.
Then NoNetLoader calls defaultLoader which does line 6, resulting in line 7.

So, it depends on your interpretation of the problem.

The resolve routine *is* performing line 6->7 correctly by my reading of
7.1.2.5 (7.1.2.2-4 being unmatched because no entry is given for that system
identifier).

I believe, that being the case, that we should not be invoking defaultLoader
inside NoNetLoader - we have already done its work and just need to read
the file ourselves.

The patch for this change looks something like this :

--8<--------
*** Original/xsltproc/c/xsltproc	Wed Sep 12 02:55:18 2001
--- RISCOS/xsltproc/c/xsltproc	Thu Sep 20 17:11:34 2001
***************
*** 197,205 ****
  	    }
  	}
      }
!     if (defaultLoader != NULL) {
! 	input = defaultLoader((const char *) resource, ID, ctxt);
      }
      if (resource != (xmlChar *) URL)
  	xmlFree(resource);
      return(input);
--- 206,238 ----
  	    }
  	}
      }
!
!     /* JRF: Replacement for calling defaultLoader - the code above /is/
!             (with the exception of the network check) the defaultLoader,
!             so calling the default loader here breaks things. The failure
!             case is :
!               - a lookup is done above for a public identifier and URL.
!               - the lookup succeeds based on the URL and from this we
!                 get a new URL on the local filesystem (using the
!                 rewriteSystem directive).
!               - this is passed to the defaultLoader, with the public
!                 identifier.
!               - the local URL isn't recognised, but the public identifier
!                 is, resulting in a transformation to a network URL
!               - an attempt is made to fetch the network URL which then
!                 fails.
!     */
!     input = xmlNewInputFromFile(ctxt, (const char *)resource);
!     if (input == NULL) {
! 	if ((ctxt->validate) && (ctxt->sax != NULL) &&
!             (ctxt->sax->error != NULL))
! 	    ctxt->sax->error(ctxt,
! 		    "failed to load external entity \"%s\"\n", resource);
! 	else if ((ctxt->sax != NULL) && (ctxt->sax->warning != NULL))
! 	    ctxt->sax->warning(ctxt,
! 		    "failed to load external entity \"%s\"\n", resource);
      }
+
      if (resource != (xmlChar *) URL)
  	xmlFree(resource);
      return(input);
***************
--8<--------

This is based on the assumption that we 'know' what defaultLoader does.
It solves my particular problem and probably others, but is still not
an ideal solution in my opinion.

I believe that this would be better accomplished by having a defaultResolver
routine, which does the whole of the resolution up to (but not including)
the read of the file. This applies primarily to libxml2, rather than to
libxslt.

This would result in the internal defaultLoader becoming a call to the
defaultResolver, followed by a xmlNewInputFromFile. The NoNetLoader would
become a call to defaultResolver, followed by validation of the URI
returned, and /then/ a call to xmlNewInputFromFile.

Therefore, I suggest a minor extension to the API :

typedef xmlChar *(*xmlExternalEntityResolver)(const char *URL,
					      const char *ID,
					      xmlParserCtxtPtr context);
  - function type to provide catalog resolution for the URL and ID
    provided to a single URL.

void xmlSetExternalEntityResolver(xmlExternalEntityResolver f)
  - sets the current resolver for external entities
xmlExternalEntityResolver xmlGetExternalEntityResolver(void)
  - reads the current resolver for external entities

Distinction:
  The Resolver resolves a PublicID/SystemID pair into a URI which is
  determined by any catalog or local caching system.
  The Loader resolves /and/ loads the entity referred to by the
  PublicID/SystemID pair.
This distinction may need to be made in the comments surrounding that section.

I can implement this code and supply patches for it, but I do not wish to
do so if there is an objection to this particular method of working.

[[[ Apologies for the length of that! ]]]

-- 
Gerph {djf0-.3w6e2w2.226,6q6w2q2,2.3,2m4}
URL: http://www.movspclr.co.uk/
... Eyes to the heavens, screaming at the sky;
    Trying to send you messages, but choking on goodbye.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]