Re: [xml] [Fwd: Resolving entity references from a DTD using SAX parser]



On Wed, Apr 29, 2009 at 04:05:43PM +0100, Rachael Churchill wrote:
Hi

I hope I'm not being rude, but I didn't get any replies to this email  
when I sent it to the mailing list before, so I'm just sending it again  
in case it got missed. If anyone can help answer my questions that'd be  
really good.

  Why SAX and why not the XML Reader ? I saw your mail, and my 10s
attention span was reduced to why do they absolutely need SAX if they
are using Entities declaration in the DTD.

Thanks very much,
Rachael

-------- Original Message --------
Subject:      [xml] Resolving entity references from a DTD using SAX parser
Date:         Fri, 17 Apr 2009 13:13:26 +0100
From:         Rachael Churchill <rachael churchill linguamatics com>
To:   xml gnome org



Hi

Please could you help me get libxml2 to resolve entity references such  
as &foo; which are defined in a DTD declared in the XML document?

I am using the SAX parser (I know about the warning on  
http://xmlsoft.org/entities.html, but we really have to use SAX for our  
purposes), declaring element handlers and character handlers and then  
calling XMLParseChunk. I originally assumed the parser would  
automatically read in the DTD when it found the declaration, and then  

  it does, it calls you back. you have to build the internal structures
associated with the entity in the associated handlers.

resolve the custom-defined entity references as it found them, like it  
does with the predefined ones such as &amp; but this seems not to be the  
case.

  Because you didn't built the entities in the internal tables when
the internal subset was parsed.

Proof:

paphio:~/XML -> cat test/ent1
<?xml version="1.0"?>
<!DOCTYPE EXAMPLE SYSTEM "example.dtd" [
<!ENTITY xml "Extensible Markup Language">
]>
<EXAMPLE>
    &xml;
</EXAMPLE>
paphio:~/XML -> ./xmllint --sax test/ent1
SAX.setDocumentLocator()
SAX.startDocument()
SAX.internalSubset(EXAMPLE, , example.dtd)
SAX.entityDecl(xml, 1, (null), (null), Extensible Markup Language)
SAX.getEntity(xml)
SAX.externalSubset(EXAMPLE, , example.dtd)
SAX.startElementNs(EXAMPLE, NULL, NULL, 0, 0, 0)
SAX.characters(
    , 5)
SAX.getEntity(xml)
SAX.error: Entity 'xml' not defined
SAX.reference(xml)
SAX.characters(
, 1)
SAX.endElementNs(EXAMPLE, NULL, NULL)
SAX.endDocument()
paphio:~/XML -> 


I think I have to declare a reference-handling function as  
SAXHandler.reference, and then in that function, which gets called when  
an entity reference is found, call xmlGetEntityFromTable to look up what  
that entity reference resolves to.  But how do I populate the table  
using the entity references defined in the DTD? The functions like  

  using the entity declaration and others DTD related declaration
callbacks.

xmlAddDtdEntity and xmlAddDocEntity seem to require an xmlDocPtr, which  
I thought only existed when doing DOM parsing. So do I need to use  

  Create one by hand.

xmlNewEntity instead, which does the same thing without needing an  
xmlDocPtr? Also, xmlGetEntityFromTable isn't exposed by the API. It's  
called from xmlGetDocEntity, xmlGetDtdEntity and xmlGetParameterEntity,  
but they all require an xmlDocPtr too. May I change the API so that  
xmlGetEntityFromTable is exposed, or is there another way of doing it?

Also, how do I get the filename of the DTD? Is there a function for  

  in the SAX.internalSubset callback. But really if you expect to have
external subset handling with libxml2 SAX you will have an awful lot of
work, or have to reuse all the existing default SAX callbacks from
SAX2.c in any case an awful lot of work, hence the warning in red !

that, or do I need to manually parse the XML document looking for the  
declaration? Then when I have the DTD,  I need to parse it. Calling  
xmlParseDTD with the filename as the SystemID argument seems to work;  
then can I use the "elements" field of the xmlDtdPtr it returns as the  
table I need to refer to in xmlGetEntityFromTable (and so not need  
xmlAdd*Entry)?

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]