Re: [xml] Parse XML file with generated entities



On Sun, Dec 28, 2014 at 07:25:13PM -0500, Carl Nygard wrote:
All,

I have a system that needs to provide custom ENTITY definitions (like
&USER_POS; and &LABEL;) to an XML file.  I'd like to use the
xmlAddDocEntity() function but I need to figure out how to insert these
calls into the normal parsing framework.  All the parsing functions seem to
create an xmlDoc as an output of the parse process and I need to be able to
create an empty xmlDoc, insert the entities, and then parse the XML file
(or memory, whatever).

Can someone point me to the API functions that would be best to use?  In
the past I've just hacked around but it was extremely non-upgradable and
I'd like to do it right this time around.  None of the examples seem to
show me the flow I'm looking for.

  Hi Carl,

sorry for the delay, I have unfortunately little time for libxml2 !
Hopefully since that's a long term issue you're still listening :-\

The problem is that this is a non conventional parsing, entities are
supposed to be defined in the DTD and that DTD defined in the document
header (referencing an external subset or using the internal subset).
That said it makes custom parsing hard and/or requires trusting the
input document for those (ahum ...).

The problem would be to get the control as soon as possible when
starting to parse the document root (I assume you are not using those
entities in other entities, which would make it more complex). That
depends on how you call the parser.

If you are using SAX, you could hook the routine adding those
xmlAddDocEntity() calls on the first callback of a start element
or on the start document one if the entities might be in use for
root element attribute values...

Similary if you are using the xmlReader APIs, hook on the first
return from Read() that should work.

Now the hard part if is you are runnng one of the classic tree building
routines, the parser takes control and there isn't a simple way to hook
the routines to define the entities and any way to proceed there will be
hackish as you would have to mimic something like the internals of
xmlReadDoc() , xmlDoRead() and only then does the parser call 
xmlParseDocument(ctxt); which will go into actually filling up the
document.
The document is created in the ctxt->sax->startDocument()
callback, one way would be to override that callback with a routine
from yours which would call the default one and then add the entity.
You can do that relatively easilly if you create the parser context
yourself and use one of the xmlCtxtRead...() calls. That sounds
to me the simplest in that third case (i.e. not using SAX or the
Reader).

  Hope this helps,

Daniel

-- 
Daniel Veillard      | Open Source and Standards, Red Hat
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]