Re: [xml] Allocating entities dynamically, without pre-existing DTD while using SAX



On Wed, Feb 19, 2003 at 12:47:05PM -0500, Daniel Veillard scribbled:
On Wed, Feb 19, 2003 at 12:58:13AM +0100, Marek Habersack wrote:
I need to make it possible for the document parser to dynamically define
entities based on the user's input (in shape of an RXML - a Caudium's
extendable markup language). I cannot pre-define a DTD because most of the
entities defined by the webserver have values assigned on a per-request
basis and they can change at any time the document is parsed/reparsed. I'm

  Hum, this is a bad start. Basically an XML document without a DOCTYPE
and using unknown entities is not well-formed. So it seems the basic
choice of using entities for "variables" is not a a really good one :-\
Yes, I realize, but this must be so for the backwards compatibility :( - if
I were to decide I wouldn't use entities for that purpose but, alas, I have
no choice.

using the SAX interface since a) the server must be able to parse an
arbitrary number of documents of unknown size, b) the xmlreader doesn't seem
to be able to handle dynamic allocation/definition of entities at all. So
far, I see three ways of doing what I want, they are:

 1. use the getEntity() handler in the handler struct and allocate the
    entities there by calling xmlAddDocEntity or xmlAddDtdEntity. 

  it looks like the solution which will "make it work" the most easilly
BUT you must be aware that the markup templates won't be XML then !
Yes - that will be one of the "modes" only, though. The normal mode will
enforce full XML compliance. The problem I have with this way of providing
the entities that both of the functions mentioned above will bail out if
their 'doc' argument is NULL. I save the parser context in the user_data
passed to the callbacks and tried to use its 'myDoc' member to pass to the
above APIs - but it seems that at the time the callback is called, myDoc in
xmlParserCtxt is NULL. That means the entity will not get added to the
document. Is there any (sane) way around it?


 2. inject the entity DTD definitions into the document's preamble just
    before parsing it.

   Ugly, really ... worse of both worlds.
Indeed :)

 3. add an external reference to the DTD that points back to the site
    handled by our server and read the DTD from there. The DTD would be
    generated dynamically by the server.

  this is actually the cleanest solution because your templates are
well formed XML now ! But this may cost a bit more at run-time I agree.
And the problem with it is that HTTP is not designed to preserve state - I
can imagine many race conditions with that approach in this application...

  Those are the three scenarios I came up with, not knowing XML2 good enough
to find any other way out of the problem. My favorite scenario is, of
course, #1 -- but the question is whether it's possible to define the
entities in such way as described? Is it safe for me to muck with xmlEntity
directly and allocate and populate the structure by hand without calling
down to the XML2 API? I would love to get rid of our current XML parser we
are using since XML2 is so much faster, so much more featured that it is,
IMO, the only viable alternative to what we have now. Any help will be
greately appreciated.

  You can bypass the APIs and use the structures directly. Since I expose
them it's possible, but this will require some work to make sure it's
stable and doesn't leak. This sounds possble, but this will probably
require a bit of debugging.
Basically, as I see it, I need xmlAddEntity not being static, I suppose :)
It seems, at first sight, I can just copy that function to my code and use
it internally to register entities. That breaks all the rules, but oh
well...

thanks for your reply,

marek

Attachment: pgpC5ZMr3Wy_T.pgp
Description: PGP signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]