Re: [xml] caching of parsed DTDs ?
- From: Daniel Veillard <veillard redhat com>
- To: Timothy Ritchey <tritchey mac com>
- Cc: xml gnome org
- Subject: Re: [xml] caching of parsed DTDs ?
- Date: Sun, 15 Dec 2002 10:45:57 -0500
On Sun, Dec 15, 2002 at 12:51:43AM -0500, Timothy Ritchey wrote:
I caught this discussion in the archives, and have been struggling with
a similar issue. I working on a docbook editor, and have been loading
my files using:
xmlDoValidityCheckingDefaultValue = 1;
xmlRecoverFile(filename);
and was looking to speed up the process. Most of the time appears to be
spent in loading the docbook dtd (15 seconds on a OS X G4/500). I was
that's not normal. It barely takes more than half a second on my
box and I'm running with a lot of debugging enabled. The is something
going on ...
thinking of caching the docbook dtd, and reusing it when opening files.
I am able to load a file, load the dtd separately, and then validate
the document as follows:
This has troubles for document applications, i.e. everything in the
internal subset is not used in such validation, and you're gonna have
troubles with entities...
xmlDoValidityCheckingDefaultValue = 0;
xmlDocPtr doc = xmlParseFile(filename);
xmlDtdPtr dtd = xmlParseDTD(NULL, doc->intSubset->SystemID);
xmlValidateDtd(&cvp, doc, dtd);
The document then validates fine, but seems to have some elements
missing. The first problem was that I was having problems using
xmlValidGetValidElements(..) on any nodes from the resulting doc. I was
able to get that working by pointing doc->extSubset = dtd. This of
course was one of those "waving-a-chicken-leg" moments, in that I have
no idea what I did, or why it worked.
Since the document had no DTD, libxml had no way to find the elements
allowed in the document !
The second issue is with entities, such as —. My original method
of loading the file inserts the entity references fine, but the second
doesn't. I am assuming that in the second instance, the original
document parsing, upon encountering an entity such as — throws an
error, and goes on generating the tree without any reference to the
offending entity. So, when I come back later on an do a post
validation, there is nothing in the doc tree that even indicates the
— ever existed.
Hum, you should get entities references in the tree but without any
definition associated (since it wasn't present at parsing time).
If you do xmllint --debug on the document you will get the structure
that will be built in memory in such case.
that I get a well-formed tree with entities.
You have entities but they are not "resolved".
I bet if you save the document you will get the — in the result,
proof that it was stored internally.
Any pointers at all would be greatly appreciated.
GDB is your friend, really. Not necessarilly to find bugs but to see
what you data strutures in memory really are.
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]