[xml] Correctly sequence notations
- From: Craig Edwards <craig blackdogfoundry com>
- To: xml gnome org
- Subject: [xml] Correctly sequence notations
- Date: Thu, 20 Nov 2014 13:50:08 +1100
Hi, I'm trying to use libxml to extract NOTATION entries on an internal DTD, but am struggling. I can't seem
to find a way to get the DTD notations interleaved in amongst the other DTD elements.
Some background - my app (which contains some XML parsing/formatting functionality) is actually written in
Objective-C, so I was originally using NSXMLDocument (DOM-based) but for some reason the notations property
on NSXMLDTD is always nil (Apple suggests this is a libxml bug, but I am not yet convinced). Their suggestion
was to use NSXMLParser (SAX-based) - which does actually return the notations, but the problem is that it
doesn't fire an event indicating that parsing has entered the DOCTYPE, so if I have the following XML, I
don't know whether comment2 is inside the DOCTYPE or outside.
<?xml version="1.0" standalone="yes" ?>
<!-- comment1 -->
<!DOCTYPE xxx SYSTEM "XXX" [
<!-- comment2 -->
<!ENTITY blah SYSTEM "BLAH" NDATA note>
<!NOTATION note PUBLIC "my notation">
]>
<xxx>some text</xxx>
So, my next step is to fallback to libxml itself. Exploring xmllint, I can see that the --format option does
indeed find and print the notations (which is good), but I've noticed that it doesn't preserve the original
order of the various entities. Its output for the above XML is (note that the NOTATION has been moved ahead
of the comment/entity):
<?xml version="1.0" standalone="yes"?>
<!-- comment1 -->
<!DOCTYPE xxx SYSTEM "XXX" [
<!NOTATION note PUBLIC "my notation" >
<!-- comment2 --><!ENTITY blah SYSTEM "BLAH" NDATA note>
]>
<xxx>some text</xxx>
More digging reveals the xmlDumpNotationTable() function - which looks like it ultimately calls an opaque
hash table scanner wherein I pass in a function pointer. OK, maybe that is what I need to do to iterate over
the notations? Some more wandering through the code leads me xmlDtdDumpOutput() - which says "Dump the
notations first as they are not in the DTD children list".
That seems odd. Why aren't the notations treated as children?
Anyway, I've tried using xmlCtxtReadFile() and traversing the resulting xmlDocPtr/xmlDtdPtr objects, but
can't find a way to get to the notations. I've also tried xmlNewTextReaderFilename() but it only seems to
traverse the XML elements, not the internal DTD.
Is there something I've missed? If notations aren't added as children, I'm not sure how to get back a
correctly sequenced set of elements (including notations). Do I really need to drop all the way back to
implementing my own SAX event handler in order to preserve the list of notations? Or have I totally missed
something obvious?
Any advise would be much appreciated (sorry for the long-winded post, but I wanted to cover off what I've
already tried).
Cheers,
Craig
[
Date Prev][Date Next] [
Thread Prev][Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]