[xml] Allocating entities dynamically, without pre-existing DTD while using SAX

Hello everybody,

  The problem mentioned in the subject occurred to me while trying to extend
a scripting language (Pike, http://pike.ida.liu.se) with an XML parser based
on xml2. The application in which the code is supposed to be used (Caudium,
http://caudium.net, a webserver) puts special requirements on the XML parser
that seem to be quite a stunt to implement using xml2. The issue itself is simple -
I need to make it possible for the document parser to dynamically define
entities based on the user's input (in shape of an RXML - a Caudium's
extendable markup language). I cannot pre-define a DTD because most of the
entities defined by the webserver have values assigned on a per-request
basis and they can change at any time the document is parsed/reparsed. I'm
using the SAX interface since a) the server must be able to parse an
arbitrary number of documents of unknown size, b) the xmlreader doesn't seem
to be able to handle dynamic allocation/definition of entities at all. So
far, I see three ways of doing what I want, they are:

 1. use the getEntity() handler in the handler struct and allocate the
    entities there by calling xmlAddDocEntity or xmlAddDtdEntity. 

    pros: - seems to be the fastest and the most flexible way
          - allows me to call back to the script in Pike to get the values
          - assures that the entity values will stay up to date at any time

    cons: - both of the APIs require xmlDocPtr to be passed, but the myDoc
            member of the xmlParserCtxt structure is NULL at the point
            when the callback is called == no entity is defined.

 2. inject the entity DTD definitions into the document's preamble just
    before parsing it.

    pros: - the entities are parsed and replaced without any further action.
    cons: - processing overhead -- each request for the document forces the
            webserver to regenerate the DTD (remember: the entities have
            highly dynamic values) and the the parser must parse it.

          - requires modification to the user's document which may break
            something in their document.

 3. add an external reference to the DTD that points back to the site
    handled by our server and read the DTD from there. The DTD would be
    generated dynamically by the server.

    pros: - similar to #2 above

    cons: - requires server-side session handling even though it might be
            not needed for other purposes (the server must store the
            original request context before the parser requests the DTD)

          - similar to #2

  Those are the three scenarios I came up with, not knowing XML2 good enough
to find any other way out of the problem. My favorite scenario is, of
course, #1 -- but the question is whether it's possible to define the
entities in such way as described? Is it safe for me to muck with xmlEntity
directly and allocate and populate the structure by hand without calling
down to the XML2 API? I would love to get rid of our current XML parser we
are using since XML2 is so much faster, so much more featured that it is,
IMO, the only viable alternative to what we have now. Any help will be
greately appreciated.



Attachment: pgpkJINs3Iu4v.pgp
Description: PGP signature

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]