[xml] Correctly sequence notations



Hi, I'm trying to use libxml to extract NOTATION entries on an internal DTD, but am struggling.  I can't seem 
to find a way to get the DTD notations interleaved in amongst the other DTD elements.

Some background - my app (which contains some XML parsing/formatting functionality) is actually written in 
Objective-C, so I was originally using NSXMLDocument (DOM-based) but for some reason the notations property 
on NSXMLDTD is always nil (Apple suggests this is a libxml bug, but I am not yet convinced). Their suggestion 
was to use NSXMLParser (SAX-based) - which does actually return the notations, but the problem is that it 
doesn't fire an event indicating that parsing has entered the DOCTYPE, so if I have the following XML, I 
don't know whether comment2 is inside the DOCTYPE or outside.

        <?xml version="1.0" standalone="yes" ?>
        <!-- comment1 -->
        <!DOCTYPE xxx SYSTEM "XXX" [
                <!-- comment2 -->
                <!ENTITY blah SYSTEM "BLAH" NDATA note>
                <!NOTATION note PUBLIC "my notation">
        ]>
        <xxx>some text</xxx>
        
So, my next step is to fallback to libxml itself.  Exploring xmllint, I can see that the --format option does 
indeed find and print the notations (which is good), but I've noticed that it doesn't preserve the original 
order of the various entities. Its output for the above XML is (note that the NOTATION has been moved ahead 
of the comment/entity):

        <?xml version="1.0" standalone="yes"?>
        <!-- comment1 -->
        <!DOCTYPE xxx SYSTEM "XXX" [
        <!NOTATION note PUBLIC "my notation" >
        <!-- comment2 --><!ENTITY blah SYSTEM "BLAH" NDATA note>
        ]>
        <xxx>some text</xxx>

More digging reveals the xmlDumpNotationTable() function - which looks like it ultimately calls an opaque 
hash table scanner wherein I pass in a function pointer.  OK, maybe that is what I need to do to iterate over 
the notations?  Some more wandering through the code leads me xmlDtdDumpOutput() - which says "Dump the 
notations first as they are not in the DTD children list".

That seems odd. Why aren't the notations treated as children?

Anyway, I've tried using xmlCtxtReadFile() and traversing the resulting xmlDocPtr/xmlDtdPtr objects, but 
can't find a way to get to the notations.  I've also tried xmlNewTextReaderFilename() but it only seems to 
traverse the XML elements, not the internal DTD.

Is there something I've missed?  If notations aren't added as children, I'm not sure how to get back a 
correctly sequenced set of elements (including notations).  Do I really need to drop all the way back to 
implementing my own SAX event handler in order to preserve the list of notations?  Or have I totally missed 
something obvious?

Any advise would be much appreciated (sorry for the long-winded post, but I wanted to cover off what I've 
already tried).

Cheers,
Craig


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]