Re: [xml] Behaviour of xmlNodeAddContent() vs. xmlNodeSetContent()

From: "Keim, Markus" <markus_keim ordat com>
To: <xml gnome org>
Subject: Re: [xml] Behaviour of xmlNodeAddContent() vs. xmlNodeSetContent()
Date: Tue, 31 Oct 2006 09:07:53 +0100

Hi Daniel, all,

-----Ursprüngliche Nachricht-----
Von: Daniel Veillard [mailto:veillard redhat com]
Gesendet: Montag, 30. Oktober 2006 09:21
An: Keim, Markus
Cc: xml gnome org
Betreff: Re: [xml] Behaviour of xmlNodeAddContent() vs.
xmlNodeSetContent()



  Either your content is already escaped, which should be the 
case if you
have existing entities references in your strings, and it 
seems obviour
you should not call for a second escaping, or it is not and 
in that case
as the documentation explain you should call it.
 If you put yourself in a situation where you have string containing
both complete entities references and single & then no 
libxml2 API will
both preserve the existing references and escape the & singleton. It's
a matter of layering, either your string is a markup fragment or it is
not. In the first case it should already be escaped, and in the second
you must escape.

  I see no problem here, really,


Somehow I've expected thus...  8)
But my posting was weakly phrased, so that's was I deserve...

Of course I don't talk about content that includes *both*, unescaped
special markup characters *and* entity references.
And it's clear that xmlNodeSetContent() covers the second case, whereas
xmlNodeAddContent() expects content with all entities resolved (tho
that's not documented, but I'll send the patch).

You could, however, be in a situation where you don't know if a content
buffer that's to be passed to libxml2 contains unescaped special characters
*or* entity references. I didn't put me there, it's a factual situation
I've do deal with. I'm just looking for a clean way to do so.

In my case, the application is a libxml2 wrapper for our COBOL developers.
In most cases, the content will come from the DBMS and contains no entity
references (but possibly unescaped special chars) so I used to call
xmlEncodeEntitiesReentrant() within my "xmlNodeSetContent"-wrapper
(the original issue raised because I'd thought that ...AddContent behaves
similar).
Now, it's possible that an input buffer is generated by an XML processor
(e.g. MS XML) and passed to the application, e.g. via network, resulting
in content buffers that are already escaped and possibly including entity
references. Of course I've thought about dedicated calls for encoded/raw
input buffers, but it's likely that the developers don't know/care of
which nature the content actually is...

Looking through the source of xmlNodeSetContent(), xmlStringGetNodeList()
et al, I'd thought that it should be possible to examine an input buffer
to determine if it includes any entity references at all, send it through
xmlEncodeEntitiesReentrant() if not, and then call the appropriate, entity-
aware function from the libxml2 tree API to set/add the node content.

Well, I still think it is possible, but probably it's my personal bad luck
that I've to deal with that, and that's nothing we would/should care about
on a libxml2 level.

But even if I scrap the idea of a general solution and implement
dedicated calls for escaped/raw input, I'm still looking for a (clean)
way to

- *set* node content if I'm sure that I've raw input (going through
  xmlEncodeEntitiesReentrant() and xmlNodeSetContent() seems inefficient
  to me, since the latter will simply undo the work of the former)

- *add* node content if I actually need entity support

I hope I'm wrong, but AFAICS (from tree.c), I'd have to play with the
members of the libxml2 node structure and some (internal) functions to
do so!?

OK, that was pretty long winded, but hopefully it'll clarifies what I'm
looking for.


Ciao, Markus

Follow-Ups:
- Re: [xml] Behaviour of xmlNodeAddContent() vs. xmlNodeSetContent()
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]