Re: [xml] external DTD validation of large XML's
- From: Noam Postavsky <npostavs users sourceforge net>
- To: Jon <jon forums gmail com>
- Cc: xml gnome org
- Subject: Re: [xml] external DTD validation of large XML's
- Date: Sat, 09 Jul 2011 21:06:45 -0400
Jon <jon forums gmail com> writes:
I am new to the libxml2 api and am looking to use it to create a simple tool that can validate large xml
files via external DTDs, and eventually XSDs. I've successfully built libxml2 on win7 using a mingw
toolchain and plan to build the tool as a statically linked exe for windows.
I've found http://mail.gnome.org/archives/xml/2004-July/msg00055.html and
http://mail.gnome.org/archives/xml/2009-November/msg00039.html and would appreciate pointers in the right
direction, either sections in xmllint.c to review or ideas on how to use the Reader api to do this.
XMLStarlet does this too, maybe it will be useful for you:
http://xmlstar.git.sourceforge.net/git/gitweb.cgi?p=xmlstar/xmlstar;a=blob;f=src/xml_validate.c;hb=HEAD
I'm more concerned about memory usage and speed and have no preference between using the SAX2 or Reader
apis.
After skimming xmllint.c I want to confirm that my understanding of the following is correct.
1) The only way to use xmllint to validate against an external DTD file is
xmllint --dtdvalid luddite.dtd file1.xml file2.xml ...
and the following will not work as neither `testSAX()` nor `streamFile()` validate against an external DTD
file:
xmllint --sax --dtdvalid luddite.dtd file1.xml ...
xmllint --stream --dtdvalid luddite.dtd file1.xml ...
Yes, as a consequence of 4).
2) Does the following mean that when using libxml2's SAX functionality a document representation of the
entire input XML is created in memory?
http://git.gnome.org/browse/libxml2/tree/xmllint.c#n1711
No, it depends on the handler in use. The code you reference there is
checking for unexpected creation of DOM tree: unexpected because neither
the emptySAXHandler nor the debugSAXHandler create a DOM tree.
3) As of v2.7.8 and using the Reader API, there is no way to validate using an external DTD similar to
http://git.gnome.org/browse/libxml2/tree/xmllint.c#n1881
http://git.gnome.org/browse/libxml2/tree/xmllint.c#n1896
Yes, see https://bugzilla.gnome.org/show_bug.cgi?id=169375
4) As of v2.7.8 and using the Reader API, there is no way to a posteriori validate using an external DTD
similar the following. A posteriori DTD validation is only available after parsing a full DOM into memory.
http://git.gnome.org/browse/libxml2/tree/xmllint.c#n2759
Yes, which in addition to the memory usage also has the problem that the
DOM structure uses 2 bytes to hold line numbers, so error messages don't
have the right line number after 65535.
https://bugzilla.gnome.org/show_bug.cgi?id=143739
If the above are correct, what do you suggest to people who want to use libxml2 to validate large XMLs with
external DTD files? Re-write the input XML file?
Pretty much yeah. It's not so bad, just a tiny DOCTYPE refering to the DTD.
Noam
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]