Re: [xml] Allocating entities dynamically, without pre-existing DTD while using SAX



On Wed, Feb 19, 2003 at 04:41:46PM -0500, Daniel Veillard scribbled:
[snip]
passed to the callbacks and tried to use its 'myDoc' member to pass to the
above APIs - but it seems that at the time the callback is called, myDoc in
xmlParserCtxt is NULL. That means the entity will not get added to the
document. Is there any (sane) way around it?

  Hum, the document builder is done *on top* of SAX. I don't think you
can expect anything about myDoc when using SAX. You create and maintain
the document if you need one.
I have just "discovered" something I think might help in my situation. Given
this fragment of code:

typedef struct 
{
  xmlParserCtxtPtr   myParser;
  char              *filename;
} UserData;

static xmlEntityPtr my_getEntity(UserData *ctx, const xmlChar *name)
{
  xmlEntityPtr   *ret;

  printf("Request for the '%s' entity\n", name);
  ret = xmlAddDocEntity(ctx->myParser->myDoc,
                        name,
                        XML_INTERNAL_GENERAL_ENTITY,
                        NULL, NULL, "some entity");
  return ret;
}

and this document fragment:

<?xml version="1.0"?>
<gjob:Helping xmlns:gjob="http://www.gnome.org/some-location";>
  <gjob:Jobs>
    &entity;
    &tst;

I get the following output:

$ ./xmlsax 2gjobs.xml 
Request for the 'entity' entity
xmlAddDocEntity: document is NULL !
Request for the 'tst' entity
xmlAddDocEntity: document is NULL !

However, with the same code as above and this document fragment:

<?xml version="1.0"?>
<!DOCTYPE doc [
<!ENTITY dummy "<p>test</p>">
]>
<gjob:Helping xmlns:gjob="http://www.gnome.org/some-location";>
  <gjob:Jobs>
    &entity;
    &tst;

I get the following output:

$ ./xmlsax gjobs.xml 
Request for the 'dummy' entity
Request for the 'entity' entity
Request for the 'tst' entity

So, it seems that myParser->myDoc is valid if at least one entity is parsed
in the document's DTD. That makes me think about using such a dummy DTD
either external or internal and then defining the rest of entities
dynamically in the getEntity() callback (with caching of course). Ugly,
granted, but it seems to be the best way in this situation.

  this is actually the cleanest solution because your templates are
well formed XML now ! But this may cost a bit more at run-time I agree.
And the problem with it is that HTTP is not designed to preserve state - I
can imagine many race conditions with that approach in this application...

  Well don't use HTTP for the DOCTYPE. Any URL based identifier will do,
and provide your own implementation of those.
Yeah, but that creates two problems - first, the efficiency. The DTD would
have to either be stored on the disk (each request for a document coming
from the net would cause a new, unique, file to be created which would have
to exist until the request is done -- as ugly as sin). Second, the
request<->dtd associativity. Even though two subsequent requests would
access the same document, the DTD would have to be generated separately for
both requests -- there would be a problem with how to efficiently, safely
and without race conditions, associate a given request with a given instance
of the DTD -- it would require dynamically rewriting of the XML file
systemId for the DTD, yuck. :)

Basically, as I see it, I need xmlAddEntity not being static, I suppose :)
It seems, at first sight, I can just copy that function to my code and use
it internally to register entities. That breaks all the rules, but oh
well...

  No use xmlAddDocEntity on a document you maintain and associated to the
request processed.
The problem is that I need the entities _while_ parsing the document - not
after parsing it. Some dumb user might wish to store the time of the day in
an entity, for example, and so the entity has to be evaluated every time it
is referenced. You see the problem :)

Oh well, thanks for your patience - I think I have figured out what to do
now. Thanks for help,

marek

Attachment: pgpJtKsCbW9Vd.pgp
Description: PGP signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]