Re: [xml] Potential wrong usage of xmlIsID() in tree.c

From: Kasimier Buchcik <K Buchcik 4commerce de>
To: Rob Richards <rrichards ctindustries net>
Cc: ML-libxml2 <xml gnome org>
Subject: Re: [xml] Potential wrong usage of xmlIsID() in tree.c
Date: Tue, 21 Feb 2006 16:06:26 +0100

Hi,

On Tue, 2006-02-21 at 08:56 -0500, Rob Richards wrote:

Kasimier Buchcik wrote:

As far as copy/import/adopt goes. I have found a couple of conflicting 
opinions for this. One is that the importing doc should determine if an 
attribute is added as an ID. The other (which is also how it is 
implemented in Xerces) is that an attribute should be copied/imported 
with the same ID-ness as in the source document. The latter would be in


I agree with the latter mechanism. This looks like the only
sane/efficient mechanism to me currently (this is also the reason why
I'm not confident with automatic IDness detection).

I agree on the latter method as well. I do see some potential issues 
with the automatic detection after thinking about it more. From a purely 
DTD view, if the document is parsed without validation or attribute 
handling enabled, the current implementation will still create IDs 
modiying the tree. There is no indication in the document that
this was not desired. So the question comes to mind should IDs be 
created or not?


Ah, good point. I think it should not automatically create IDs in this
case, since we don't know if the attribute's element is valid, and thus
if the element/attribute is valid at that position of the tree. So we
could run into creating IDness for attributes which wouldn't become IDs
if processed by the validator.

<snip />

retrieve by ID in the importing doc.

I also found out that all attributes require an element to be considered 
an ID (even the manually set ones), so modified the code appropriately


For XML Schema it's not only the element, but the whole ancestor axis.
Currently we have no other way of querying if an attribute is of type ID
than validating the whole tree. So this mechanism would work for DTD
only.

Right. I haven't looked at the state of using other schemas to determine 
ID-ness at this point so was just concentrating on DTDs where I know it 
is implemented.


I tried to write down the changes and my observations of this patch:

1) xmlAddNextSibling():
  a) I don't know if denying to add an attribute-sibling if @cur
    is not attached to an element, will break something for someone.

It never even crossed my mind that someone might be building a list of 
attributes without a corresponding element. Have no qualms about 
removing that check.


I really dunno; the attribute handling is this function looks weird
anyway, so I could think of that one being used for weird use-cases.
But I have no preferences here, since I don't use it.

  
  b) You noticed the potential chance here of freeing
    the attribute (@cur) to which a sibling attribute (@elem) should be
    attached if both have the same ns and local-name. We need to fix
    that.

Yup, ran across this issue in all the sibling functions.


OK. What do you think about compiling this semantic and
also the addition/removal of IDs into one function to be called by the
various append-something-to-something functions? Or would it slow down
things too much?

  c) Automatic IDness detection.

Right now I was looking at consistency. It already exists creating and 
setting attributes, so however the discussion on automatic detection


True. Do you know when the already existing behaviour was added? If
it was done recently, then one could still bring forward the argument
that it was a bug to add that behaviour - hmm, or is it too late?

shakes out I think it should work in the same manner for related 
functionality.


Yes.

2) xmlAddPrevSibling():
  a) Similar to xmlAddNextSibling().
  b) Tries to automatically convert the doc of @elem, if different.
    I fear that trying to handle moving-to-different-docs related
    semantics at the level of every tree-modifying function will
    result in much scattered and potentially hard-to-maintain code.

I agree with this and personally handle this myself prior to calling the 
functions, but the functionality for this has always been here so could 
possibly break apps if removed. All I did here was move it around a bit.


Ah, OK. So it even needs to take care of moving the value to the
destination doc's dict if it was in the source doc's dict.

   

3) xmlAddSibling():
  a) Same as 1)

4) xmlAddChild():
  a) Same as 1c)

5) xmlUnlinkNode():
  a) IDness removal on the basis of atype == XML_ATTRIBUTE_ID

There is no need to check xmlIsID as the attr cannot be an ID without an 
element, so the extra check can be eliminated.

6) xmlReplaceNode():
  a) Same as 1c)
  b) Same as 5a)

7) xmlSetProp() and xmlSetNsProp():
  a) Same as 5a)


About 2b):
  I would be happy if we could agree on using the "adopt" semantics
  prior to handing nodes over to tree-modyfing functions. Although
  the xmlDOMWrapAdoptNode() function looks a bit scary at first
  sight, it could constitute a centralized way of get rid of
  doc-moving issues in every tiny function.

Agreed. This is one thing I have been meaning at looking at. The issue I 
hit was allowing a node without a doc to be passed to the setTreeDoc 
functionality with a doc using dictionaries, which resulted in the doc 
having mixed usage. Not sure what the result is if setTreeDoc is called 
where both docs use dicts. I do believe I saw some bigger problems with 
this when I first looked at it (has been a low priority for me since I 
don't allow doc adoption in these functions other than a node without a 
doc and doing so resulted in minimal impact).


If you could send a tiny example of what didn't work with it, I would
fix it :-)

About automatic ID detection:
  IIRC, for DOM, there's no such automatic behaviour defined.

This would probably fall under the category of:
"using mechanisms that are outside the scope of this specification, it 
is then an externally-determined ID attribute."


Hmm, I don't think so, since for DTD, XML Schema and setting IDs via the
API, this is defined (directly above that specific sentence in the
spec). I think this is intended to be a fall-back rule for other
mechanisms.

<snip />

  Additionally automatic IDness detection works for DTDs only, since
  using XML Schema, there's no way to say for a element/attribute
  combination if the attribute is of type xs:ID without validating the
  _whole_ document.

  If talking about efficiency wrt speed, then automatic IDness detection
  is clearly not nice; I hope we don't go this way until there are 
  convincing arguments (specs?) to do so.

  I hope we can still discuss the issue again, before implementing
  IDness detection.

Speed is definitely a concern as doing this will definitely slow things 
down a bit. I'm not looking to rush to get this implemented, but the cat 
is already out of the bag since the functionality already exists when 
creating, setting and copying props. Just glad that the proposed changes 
got things moving along because if the existing behavior is to change I 
would rather see it changed sooner than later.

Rob


OK. The first important thing for me to know is, if we can revert the
already existent automatic detection of IDness via xmlIsID() or not.

Thoughts?

Regards,

Kasimier

Follow-Ups:
- Re: [xml] Potential wrong usage of xmlIsID() in tree.c
  - From: Kasimier Buchcik
- Re: [xml] Potential wrong usage of xmlIsID() in tree.c
  - From: Rob Richards
- Re: [xml] Potential wrong usage of xmlIsID() in tree.c
  - From: Kasimier Buchcik

References:
- [xml] Potential wrong usage of xmlIsID() in tree.c
  - From: Kasimier Buchcik
- Re: [xml] Potential wrong usage of xmlIsID() in tree.c
  - From: Daniel Veillard
- Re: [xml] Potential wrong usage of xmlIsID() in tree.c
  - From: Kasimier Buchcik
- Re: [xml] Potential wrong usage of xmlIsID() in tree.c
  - From: Daniel Veillard
- Re: [xml] Potential wrong usage of xmlIsID() in tree.c
  - From: Kasimier Buchcik
- Re: [xml] Potential wrong usage of xmlIsID() in tree.c
  - From: Rob Richards
- Re: [xml] Potential wrong usage of xmlIsID() in tree.c
  - From: Kasimier Buchcik
- Re: [xml] Potential wrong usage of xmlIsID() in tree.c
  - From: Rob Richards
- Re: [xml] Potential wrong usage of xmlIsID() in tree.c
  - From: Kasimier Buchcik
- Re: [xml] Potential wrong usage of xmlIsID() in tree.c
  - From: Rob Richards

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]