Re: [xml] caching of parsed DTDs ?

From: Daniel Veillard <veillard redhat com>
To: Timothy Ritchey <tritchey mac com>
Cc: xml gnome org
Subject: Re: [xml] caching of parsed DTDs ?
Date: Sun, 15 Dec 2002 10:45:57 -0500

On Sun, Dec 15, 2002 at 12:51:43AM -0500, Timothy Ritchey wrote:

I caught this discussion in the archives, and have been struggling with 
a similar issue. I working on a docbook editor, and have been loading 
my files using:

xmlDoValidityCheckingDefaultValue = 1;
xmlRecoverFile(filename);

and was looking to speed up the process. Most of the time appears to be 
spent in loading the docbook dtd (15 seconds on a OS X G4/500). I was

  that's not normal. It barely takes more than half a second on my
box and I'm running with a lot of debugging enabled. The is something
going on ...

thinking of caching the docbook dtd, and reusing it when opening files.
I am able to load a file, load the dtd separately, and then validate 
the document as follows:

  This has troubles for document applications, i.e. everything in the
internal subset is not used in such validation, and you're gonna have
troubles with entities...

xmlDoValidityCheckingDefaultValue = 0;
xmlDocPtr doc = xmlParseFile(filename);
xmlDtdPtr dtd = xmlParseDTD(NULL, doc->intSubset->SystemID);
xmlValidateDtd(&cvp, doc, dtd);

The document then validates fine, but seems to have some elements 
missing. The first problem was that I was having problems using 
xmlValidGetValidElements(..) on any nodes from the resulting doc. I was 
able to get that working by pointing doc->extSubset = dtd. This of 
course was one of those "waving-a-chicken-leg" moments, in that I have 
no idea what I did, or why it worked.

  Since the document had no DTD, libxml had no way to find the elements
allowed in the document !

The second issue is with entities, such as &mdash;. My original method 
of loading the file inserts the entity references fine, but the second 
doesn't. I am assuming that in the second instance, the original 
document parsing, upon encountering an entity such as &mdash; throws an 
error, and goes on generating the tree without any reference to the 
offending entity. So, when I come back later on an do a post 
validation, there is nothing in the doc tree that even indicates the 
&mdash; ever existed.

  Hum, you should get entities references in the tree but without any
definition associated (since it wasn't present at parsing time).
If you do xmllint --debug on the document you will get the structure
that will be built in memory in such case.

that I get a well-formed tree with entities.

  You have entities but they are not "resolved".
I bet if you save the document you will get the &mdash; in the result,
proof that it was stored internally.

Any pointers at all would be greatly appreciated.

  GDB is your friend, really. Not necessarilly to find bugs but to see
what you data strutures in memory really are.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Follow-Ups:
- Re: [xml] caching of parsed DTDs ?
  - From: Timothy Ritchey

References:
- Re: [xml] caching of parsed DTDs ?
  - From: Timothy Ritchey

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]