Re: [xml] external DTD validation of large XML's



If the above are correct, what do you suggest to people who want to use libxml2 to validate large XMLs 
with external DTD files?  Re-write the input XML file?

Pretty much yeah. It's not so bad, just a tiny DOCTYPE refering to the DTD.

In many cases you don't even need that. Write a shell XML file,

<!DOCTYPE wrapper SYSTEM "the-dtd-file.dtd" [
  <!ELEMENT wrapper the-real-root-element>
  <!ENTITY the-real-document SYSTEM "bigfile.xml">
]>
<wrapper>&the-real-document;</wrapper>

and then validate that wrapper file.

This assumes the large file has a root (outermost) element of
   <the-real-root-element>.....</the-real-root-element>

Somedays you see things that make you gasp "Why didn't I think of that!"  How wonderfully clever.

This may work very well for me since (a) my bigfile.xml's never (yeh, right) contain a DOCTYPE, and (b) I can 
do the wrapping in memory and feed the buffer to libxml2.

Will the libxml2 implementation try to bring the entire &the-real-document; entity into memory, or will it 
stream it if I use the SAX2 or Reader API?  My gut tells me both the dtd and the bigfile.xml will be 
completely parsed into memory. This is fine for the dtd but not for the bigfile.xml.

Jon

---
blog: http://jonforums.github.com/
twitter: @jonforums

"Anyone who can only think of one way to spell a word obviously lacks imagination." - Mark Twain



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]