Re: [xml] Substitution of nested entity references



On Wed, Apr 17, 2002 at 09:26:21AM +0200, Henke, Markus wrote:
Hello Daniel,

[i guess you havn't received this mail due to the
mailer outage, so i send it again...]

  right, thanks for reposting it,

-----Original Message-----
From: Daniel Veillard [mailto:veillard redhat com]
Sent: Wednesday, April 10, 2002 8:30 PM
To: Henke, Markus
Cc: 'xml gnome org'
Subject: Re: [xml] Substitution of nested entity references

<snip />

[...] yes, simply fix xmlNodeGetContent() and
xmlNodeListGetString(,,1) to behave correctly in this case.
I actually think it is a libxml2 bug and I would appreciate if you
could take care of it since you have explored much of it.
Simply recurse on entity refs content to complete the work currently
handled only at level 1 of entity indirection. As explained before
this shall not loop since it would result in a well formedness
error to have a loop in entities references.

  thanks !

Daniel

the patch is basically finished, but i've a serious problem to
test it:
To do a complete recurse on entity refs content, i need the
child node list of the corresponding entity declaration.
But it seems that the "ent->childs" list of an entity declaration
remains NULL after parsing if this entity occurs only in an
attribut value.

  Okay entities references in attributes values are a mess, I blame
SAX, libxml(2) keep the information about entities references in
attributes values, while the SAX API forces them to be pure string values
i.e. being substitued. The only way I could try to reach my goal
of keeping the information, building the DOM on top of SAX and
still be conformant is by having the parser do all the entity analysis
but discard it when crossing SAX, and delivering at the SAX level a
string containing the entities references as the attribute value, and
then reparse it when doing the DOM tree. The fact that strangeness
may be induce in such a process is not a big surprize.

Sounds a bit strange, an example may help:

==================================================================
<?xml version="1.0" encoding="iso-8859-1" standalone="yes" ?>

<!DOCTYPE aElement [
  <!ELEMENT aElement ANY>
  <!ATTLIST aElement aAttr CDATA #REQUIRED>
  <!ENTITY aNestedEnt "i_am_a_nested_entity">
  <!ENTITY aEnt "A_&aNestedEnt;_A">
  <!ENTITY bEnt "B_&aEnt;_B">
]>

<aElement aAttr="Text &aEnt; MoreText">Content &bEnt; MoreContent
</aElement>
====================================================================

=============================================
/* [...] */
xmlSubstituteEntitiesDefault(0);
docPtr = xmlParseFile("./nestedEntity.xml");
ent = xmlGetDocEntity(docPtr, "aEnt");
/* we'll get ent->childs == NULL when 'aEnt'
   only occurs in attribut value */
ent = xmlGetDocEntity(docPtr, "bEnt");
/* we'll get ent->childs != NULL when 'bEnt'
   occurs as element content */
=============================================

As soon as 'aEnt' can also be found as part of element content,
the "childs" list of the entity declaration is non NULL.

I don't know if this behaviour is intended, i guess it's

 No, it's a bug.

rather not. I've tried to debug my test program to find out what
happens, but i was quite lost in the depths of the libxml
DOM-building process.

  Yes it's awfully complex for the reasons explained before.

If this behaviour is a bug and you can point me in the
right direction, i'll try to fix it. If not... well,
we'll see  :)

  Start from the attribute() function in SAX.c, there there is
the call which takes the attribute value with the entities still
references and build a node list from them. That's where you should
look at, in xmlStringGetNodeList().
  Preserving entities references in attribute values introduce a 
serious amount of complexity in that proces :-\

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]