[xml] Substitution of nested entity references



Hello Daniel,

and sorry to broach that topic again, but i'd hoped
that the approach mentioned below is possibly worth
a comment...  8)

Kidding asside, i really won't to press for something
or get on someone's nerves. It's just that my
(libxml2 based) application (an API for COBOL) is
short to a first release version and i've to decide
whether i've to handle the substitution of nested
entity references on application level or to rely
on libxml.
If it's not intended to do the substitution in
libxml, it's maybe a good idea to point this out in
the docs/reference for xmlNodeGetContent(),
xmlGetProp() and the like!?


TIA, Markus

-----Original Message-----
From: Henke, Markus 
Sent: Wednesday, March 20, 2002 4:08 PM
To: 'xml gnome org'
Subject: RE: [xml] xmlNodeGetContent() for XML_ENTITY_REF_NODE


Hello,

and please apologize the long response time, but there
where other things to take care of (among them some holidays...8).

In the meantime i've written a routine that does the (recursive)
substitution for (nested) entitiy reference, and it works fine for
me so far (code included below).
Anyhow, there are some points to clarify before it may be usefull
for libxml2:

- Is it possible that someone can get a xmlDocPtr to a non
  well-formed (possibly due to cyclic entity references)
  document? This could cause serious problems in this
  routine... 8)

- Is xmlGetDocEntity() the proper way to lookup the entity
  declaration in that case? There is also xmlGetDtdEntity(),
  i'm not shure about the difference. Both take a xmlDocPtr
  as parameter (and i must confess that i haven't had the
  time by now to examine the source for that...)

- I've assumed that a xmlEntity have only child nodes with
  type XML_ENTITY_REF_NODE or XML_TEXT_NODE. Is this correct,
  and if not, which additional cases were to take into
  consideration?

I would be glad to improve this routine if there is any
chance that it could be used in libxml2 to resolve nested
entity references, i guess at least xmlNodeGetContent and
xmlGetProp() [xmlGetNsProp()] should do so, the reference
says
... Entity references are substituted... e.g.
... This does the entity substitution...


Thanx & Ciao, Markus


===============================================================
/**
 * xmlResolveEntityRef:
 * @docPtr:  the document pointer
 * @entityRefPtr:  pointer to a XML_ENTITY_REF_NODE
 * @contentBufferPtr: pointer to a xmlBuffer for the resolved content
 *
 * Does the substitution of an entity reference.
 * Recursive behaviour for nested references.
 *
 * Returns the size of the resolved content or -1 in case of error
 */

int xmlResolveEntityRef(xmlDocPtr docPtr,
             xmlNodePtr entityRefPtr,
             xmlBufferPtr contentBufferPtr)
{
      
  int contentLen = 0;
  xmlEntityPtr entityDeclPtr = (xmlEntityPtr)NULL;
  xmlNodePtr nodePtr = (xmlNodePtr)NULL;

  /*** check node type ***/
  if (entityRefPtr->type != XML_ENTITY_REF_NODE)
  {
    xmlGenericError(xmlGenericErrorContext,
      "xmlResolveEntityRef: Node type != XML_ENTITY_REF_NODE\n");
    return -1;
  }
      
  /*** lookup entity declaration ***/
  if (!(entityDeclPtr = xmlGetDocEntity(docPtr, entityRefPtr->name)))
  {
    xmlGenericError(xmlGenericErrorContext,
      "xmlResolveEntityRef: Lookup for entity declaration failed\n");
    return -1;
  }

  nodePtr = entityDeclPtr->children;
  while (nodePtr)
  {
    switch (nodePtr->type)
    {
      case XML_ENTITY_REF_NODE:
      {
        /*** recursive call to xmlResolveEntityRef ***/
        contentLen += xmlResolveEntityRef(docPtr,
                          nodePtr, contentBufferPtr);
        break;
      }
      case XML_TEXT_NODE:
      {
        /*** read and append (text) node content ***/
        xmlChar *contStr = xmlNodeGetContent(nodePtr);
        xmlBufferCat(contentBufferPtr, (const xmlChar*)contStr);
        contentLen += xmlStrlen((const xmlChar*)contStr);
        xmlFree(contStr);
        break;
      }
      default:
      {
        /*** what shall we do here, could this happen? ***/
        xmlGenericError(xmlGenericErrorContext,
          "xmlResolveEntityRef: Unexpected node type: %d\n",
          nodePtr->type);
        return -1;
      }
    }
    nodePtr = nodePtr->next;
  }

  return contentLen;

}
==================================================================



-----Original Message-----
From: Daniel Veillard [mailto:veillard redhat com]
Sent: Friday, February 22, 2002 4:07 PM
To: Henke, Markus
Cc: 'xml gnome org'
Subject: Re: [xml] xmlNodeGetContent() for XML_ENTITY_REF_NODE


On Fri, Feb 22, 2002 at 03:35:02PM +0100, Henke, Markus wrote:
I'm not sure you can get this recursive.


Hum, you mean nested entity references? Weird, i've never

  yep

thought about this. My app instantly crashed as i've tested
it. Smell's like a new debugging session... 8]
I'd hoped that i could pilfer at 'xmlGetProp()', but it won't
do a recursiv entity substitution!?
However, i've to do this and i should manage it since cyclic
entity refs are illegal.
BTW, the libxml parser detects cyclic entity
references but it seems that it loops about 40 times? 

  Call me lazy, I need to fix and report only the first one

Another way would be to change xmlNodeGetContent()
behaviour on entities nodes.

I'll try to code a recursiv entity substitution for
XML_ENTITY_REF_NODEs. If it works for me i'll send
it to the list so you can decide if it makes sense to
include it in xmlNodeGetContent() (and possibly
xmlGetProp()...).

  Hum, okay,

  Actually it's a pointer to the associated ENTITY_DEF
node content's, the subtree are shared.

I see. Debugging don't show that (at least not obvious).

  yes a pointer can't tell if it's shared (wouldn;t that be cool !)

Then xmlSubstituteEntitiesDefault(0) is what i need...8)

  probable. it's --noent option for xmllint

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml Gnome XML XSLT toolkit  
http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]