Re: [xml] caching of parsed DTDs ?



Daniel,

thanks for your info !

On Tue, 10 Dec 2002, Daniel Veillard wrote:

On Tue, Dec 10, 2002 at 11:08:39PM +0100, rost lo-res org wrote:
[...]
What I am wondering about if this functionality is already implemented in
libxml2, or do DTDs get parsed every time a document references it as the
corresponding DTD ?

[...]
  The problem is that it "doesn't work" in general because you can have
and internal subset within the document overriding some definitions made
inside the "external subset" found in the external DTD file(s). If it's not
the case the validation using xmlValidateDtd is safe.

I see your point. However, given the kind of input I safely can
assume that there aren't definitions of the external DTD overridden.

What I came up with seems to be safe according to your explanation (note:
I use libxml2 default behaviour of not validating when calling xmlParse),
just for information purposes if anybody is interested:

When getting a parsed xmlDoc I run a lookup on the cache using
the documents DTD id. If I find it in my cache I return a pointer to the
parsed DTD. If however the document's DTD has not been cached previously
I create the DTD by an excerpt of libxml2's ValidateDocument routine...

  if ((doc->intSubset != NULL) && ((doc->intSubset->SystemID != NULL) ||
        (doc->intSubset->ExternalID != NULL)) && (doc->extSubset == NULL))
{

    doc->extSubset = xmlParseDTD(doc->intSubset->ExternalID,
        doc->intSubset->SystemID);
  }

  if (doc->extSubset == NULL) {
    /* handle error */
  }

  dtd = doc->extSubset;

Afterwards I store a copy to dtd (using xmlCopyDTD(dtd)) to my cache,
which can be looked up by it's System ID.

This has shown to be necessary to do before calling xmlValidateDtd, since
otherwise (as you made clear already) I would end up with an unparsed DTD
struct in doc->intSubset (doc->extSubset being NULL previously), and
validation would fail.

The code now seems to work seamlessly in my code,
thanks again for help,
Robert


---
GPG: gpg --keyserver wwwkeys.pgp.net --recv-keys CCFDEB89




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]