Re: [xml] external DTD validation of large XML's



I am new to the libxml2 api and am looking to use it to create a simple tool that can validate large xml 
files via external DTDs, and eventually XSDs. I've successfully built libxml2 on win7 using a mingw 
toolchain and plan to build the tool as a statically linked exe for windows.

I've found http://mail.gnome.org/archives/xml/2004-July/msg00055.html and 
http://mail.gnome.org/archives/xml/2009-November/msg00039.html and would appreciate pointers in the right 
direction, either sections in xmllint.c to review or ideas on how to use the Reader api to do this.

I'm more concerned about memory usage and speed and have no preference between using the SAX2 or Reader 
apis.


After skimming xmllint.c I want to confirm that my understanding of the following is correct.

1) The only way to use xmllint to validate against an external DTD file is

   xmllint --dtdvalid luddite.dtd file1.xml file2.xml ...

and the following will not work as neither `testSAX()` nor `streamFile()` validate against an external DTD 
file:

   xmllint --sax --dtdvalid luddite.dtd file1.xml ...
   xmllint --stream --dtdvalid luddite.dtd file1.xml ...


2) Does the following mean that when using libxml2's SAX functionality a document representation of the 
entire input XML is created in memory?

   http://git.gnome.org/browse/libxml2/tree/xmllint.c#n1711


3) As of v2.7.8 and using the Reader API, there is no way to validate using an external DTD similar to

   http://git.gnome.org/browse/libxml2/tree/xmllint.c#n1881
   http://git.gnome.org/browse/libxml2/tree/xmllint.c#n1896


4) As of v2.7.8 and using the Reader API, there is no way to a posteriori validate using an external DTD 
similar the following. A posteriori DTD validation is only available after parsing a full DOM into memory.

   http://git.gnome.org/browse/libxml2/tree/xmllint.c#n2759


If the above are correct, what do you suggest to people who want to use libxml2 to validate large XMLs with 
external DTD files?  Re-write the input XML file?

Jon

---
blog: http://jonforums.github.com/
twitter: @jonforums

"Anyone who can only think of one way to spell a word obviously lacks imagination." - Mark Twain



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]