Re: [xml] entity handling concept



On Wed, Apr 18, 2001 at 09:19:12PM +0200, Christian Glahn wrote:
hi everybody,

during the weekend I tried to get through the concept for the 
entity reference concept in the document tree. As it is written
in the TODO file, the handling of % entity references has to 
be rethought. 

  no necessarilly, until we want to be able to edit and save DTDs and
that's definitely not urgent.

While I am implementing a small subsystem to handle XML files, this 
problem became important for me.

What kind of application do you use which need to write back PE references ?
Are you sure you really understand. 

I would like to read, set up and write 
document trees without bothering about PERefs I need to use by chance. 

You have the right to use PE references only in the external subset, 
I assume you understand the difference between Parameter entities and
general entities.

As I understand the document tree handling, there is no way to do 
such a job without rethinking the way of handling entity references a bit
in the current version of libxml2 (which is to me 2.3.5 but I believe 
this is also true for 2.3.6). 

But PE references *cannot* be handled in a document tree, they
are *by essence* unstructured macro like facilities.

I got through the code of tree.c and realized, that &entity; nodes
store the entity name and the value of the entity in the node 
nested inside the document tree rather than using it as a reference.

&entities; are *not* PE references.

As well the child pointer of the node (*children and *last) point to
the entity itself. i don't know why latter is done, since the entity 
node is not needed anymore to handle this specific node. 

Okay entity nodes in the tree are not for PE References, they are for
general entities (external parsed or internals). There is good reasons
to keep both the tree form and the raw content. If you don't understand
why think about: saving, validating, copying nodes.

While thinking about representing PERefs in the Dtd, I would more 
agree into an representation which follows the current implmentation 
of "normal" entities (&entities;). This means to store the data (tree) 
underneath the %entity;-node rather than replacing the entity reference 
entierly with the imported data. 

  problem is that you cannot build a sane tree for PE references.
Just to take an example look at:

<!ATTLIST html
  %i18n;
  xmlns       %URI;          #FIXED 'http://www.w3.org/1999/xhtml'
  >

 and

<!ENTITY % button.content
   "(#PCDATA | p | %heading; | div | %lists; | %blocktext; |
    table | %special; | %fontstyle; | %phrase; | %misc;)*">

of course %special; etc are themselve recursive macros ....
I don't see how to map this to a tree in any useful way for libxml

Such an implementation could be done transparent to other parts of 
the library (and its extensions) through providing small wrapper functions 
(e.g. getNodeNextSibling() or setNodePrevSibling() etc.) to access 
the data. 

  and that would iterate over what kind of nodes ?

The bad thing about this concept is, that it keeps for each reference 
node the whole data tree in memory. This could be a problem if such
entities are used intensivly to import large sub-trees. 

  The bad thing is that parameter entities are not structured, they
are not mappable in a libxml tree in any sane way. And unless you
need to save back modified DTDs I don't see why you would need to
try to store this unstructured format that is DTDs. And honnestly
if you want to do DTD modification with PE support libxml2 doesn't
sound the right tool for the job.

Daniel

-- 
Daniel Veillard      | Red Hat Network http://redhat.com/products/network/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]