Re: [xml] caching of parsed DTDs ?

On Tue, Dec 10, 2002 at 11:08:39PM +0100, rost lo-res org wrote:
My application is using libxml to parse and validate XML-documents, which
most of the time use the same DTD. To speed up computation I am thinking
about implementing a caching mechanism for DTDs. So, if a document using
DTD "test.dtd" gets parsed, I want to retrieve the xmlDTDPtr for
further validation of documents using the same DTD without actually
parsing the DTD again (the validity of a DTD is ensured to stay ok during
the lifetime of the process).

What I am wondering about if this functionality is already implemented in
libxml2, or do DTDs get parsed every time a document references it as the
corresponding DTD ?

  It's a bit complex.

By default libxml2 doesn't validate and doesn't load the DTDs (that's
configurable, it's just the default behaviour).

You can load a DTD and parse it to an xmlDTDPtr:

  paphio:~/XML -> grep ^xmlDtdPtr include/libxml/*.h
  include/libxml/parser.h:xmlDtdPtr       xmlParseDTD             (const xmlChar *ExternalID,
  include/libxml/parser.h:xmlDtdPtr       xmlSAXParseDTD          (xmlSAXHandlerPtr sax,
  include/libxml/parser.h:xmlDtdPtr       xmlIOParseDTD           (xmlSAXHandlerPtr sax,

  You can also validate a document parsed (but not validated during parsing)
against a given DTD:
  int             xmlValidateDtd          (xmlValidCtxtPtr ctxt,
                                           xmlDocPtr doc,
                                           xmlDtdPtr dtd);

  (create your own xmlValidCtxtPtr to provide the callbacks)

  The problem is that it "doesn't work" in general because you can have
and internal subset within the document overriding some definitions made
inside the "external subset" found in the external DTD file(s). If it's not
the case the validation using xmlValidateDtd is safe.
  When a document is parsed but not validated, any DOCTYPE information is
stored anyway in the document as an internal xmlDtdPtr node find it and
you will get the information to lookup the external subset.


Daniel Veillard      | Red Hat Network
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]