Re: [xml] caching of parsed DTDs ?



I caught this discussion in the archives, and have been struggling with a similar issue. I working on a docbook editor, and have been loading my files using:

xmlDoValidityCheckingDefaultValue = 1;
xmlRecoverFile(filename);

and was looking to speed up the process. Most of the time appears to be spent in loading the docbook dtd (15 seconds on a OS X G4/500). I was thinking of caching the docbook dtd, and reusing it when opening files. I am able to load a file, load the dtd separately, and then validate the document as follows:

xmlDoValidityCheckingDefaultValue = 0;
xmlDocPtr doc = xmlParseFile(filename);
xmlDtdPtr dtd = xmlParseDTD(NULL, doc->intSubset->SystemID);
xmlValidateDtd(&cvp, doc, dtd);

The document then validates fine, but seems to have some elements missing. The first problem was that I was having problems using xmlValidGetValidElements(..) on any nodes from the resulting doc. I was able to get that working by pointing doc->extSubset = dtd. This of course was one of those "waving-a-chicken-leg" moments, in that I have no idea what I did, or why it worked.

The second issue is with entities, such as —. My original method of loading the file inserts the entity references fine, but the second doesn't. I am assuming that in the second instance, the original document parsing, upon encountering an entity such as — throws an error, and goes on generating the tree without any reference to the offending entity. So, when I come back later on an do a post validation, there is nothing in the doc tree that even indicates the — ever existed.

As you can tell, I am a novice at libxml, and am flailing around in the docs trying to figure out how to get things done in the right order so that I get a well-formed tree with entities.

Any pointers at all would be greatly appreciated.

Cheers,
tim






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]