Re: [xml] Potential wrong usage of xmlIsID() in tree.c
- From: Rob Richards <rrichards ctindustries net>
- To: Kasimier Buchcik <K Buchcik 4commerce de>
- Cc: ML-libxml2 <xml gnome org>
- Subject: Re: [xml] Potential wrong usage of xmlIsID() in tree.c
- Date: Thu, 23 Feb 2006 08:08:55 -0500
Kasimier Buchcik wrote:
Hi
On Wed, 2006-02-22 at 19:39 -0500, Rob Richards wrote:
How do you figure (or are you referring to the case of a document not
parsed in validating mode)? A DTD doesn't allow a redefinition of an
element and an element can have only a single ID defined, so the query
for an element/attr combo should be enough. XML Schemas would be a whole
different story though.
The problem occurs regardless of a validation being performed or not.
Example:
<!DOCTYPE foo [
<!ELEMENT foo (bar)>
<!ELEMENT bar>
<!ATTLIST bar myId ID #IMPLIED>
]>
Assume we parse and validate the following XML:
<foo>
<bar myID="1"/>
</foo>
The API wouldn't prevent us to add a <bar> to the <bar>:
<foo>
<bar myID="1">
<bar/>
</bar>
</foo>
If we now add @myID to the newly created <bar>, then the current
ID-autodetection would mark it as of type ID:
<foo>
<bar myID="1">
<bar myID="2"/>
</bar>
</foo>
The problem I see here, is that the second <bar> is not valid according
to the DTD, and thus its @myID shouldn't become an ID; that's why
querying the DTD for <bar>/@myID is not enough to evaluate if @myID is
of type ID.
This in mind, I think the DOM people had a good reason not to address
any schema/DTD based automatic ID-detection.
Thanks for the clarification. After your follow up message I thought you
meant that detection in general was broken.
I just saw your latest message and I was about to propose something
exactly along those lines. This way if it defaulted to enabled then
libxml2 could operate as intended. In the case using the lib to
implement DOM, it could be disabled and ID detection work as we have
kind of layed out in these messages. Is it possible to extend the
document node to include flags? This way it might also serve any future
need to provide some instructions or indications on the state of the
document?
To your question: Ah, good point :-) I think this should either be made
settable or be avoided. The latter being the solution I would prefer.
I haven't scoured the XInclude code, but when a copy is being performed
from there, has the attribute being copied already been determined to be
I tried to test this with the following scenario:
"xinclude-test.xml": the including doc
--------------------------------------
<?xml version="1.0"?>
<foo xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include href="xinc.xml" xpointer="id-01/1"/>
<xi:include href="xinc.xml" xpointer="id-02/1"/>
</foo>
"xinc.xml": the included doc
----------------------------
<?xml version="1.0"?>
<!DOCTYPE items SYSTEM "xinc.dtd">
<items xml:lang="en-us">
<item id="id-01">
<value>first</value>
</item>
<item id="id-02">
<value>second</value>
</item>
</items>
"xinc.dtd": the DTD
-------------------
<!ELEMENT item (value)>
<!ATTLIST item id ID #IMPLIED>
Note that this DTD does only reflect the <item>/@id
relationship.
xmllint produces the following result:
xmllint --xinclude xinclude-test.xml
<?xml version="1.0"?>
<!DOCTYPE foo>
<foo xmlns:xi="http://www.w3.org/2001/XInclude">
<value>first</value>
<value>second</value>
</foo>
So the IDs have been detected, otherwise the XPointer expression
wouldn't work. But the XIncluded-doc does not seem to have been
validated, since the "xinc.xml" is clearly invalid wrt to the
DTD "xinc.dtd".
I looked at Libxml2's XInclude code and found that
in xmlXIncludeParseFile(), XML_PARSE_DTDLOAD and XML_DETECT_IDS
are hard-coded to be set. So the reasons for the observed behaviour
are visible here. XML_PARSE_DTDVALID (switch on validation) is expected
to be set by the user.
Hmm, is this the correct behaviour? Can we query for IDness without
knowing if the doc is valid?
Not sure. I need to do more in-depth reading on XInclude to even come
close to answering this.
an ID? If so, I would not be adverse to having the copied attribute an
ID as well in all cases. I assume that the ID behavior when copying an
attribute is not going to be removed due to the use from XInclude, but
if the atype could be tested rather than xmlIsID, then the negative
impact would at least be less than it currently is.
I 100% agree.
When creating a new property is it possible that an ID check is only
performed if the attribute is being created in the scope of an element
and NULL is passed as the parent element to xmlIsID? This would at least
only create an ID if it is a proper xml:id.
I agree. The automatic IDness detection I'm trying to get rid of
is the one based on DTDs/schemata. xml:id should be detected.
+ 1) If creating a new attribute, I see the need to evaluate if the
element of the attribute is inside the doc's tree and only then to
create an ID.
+ 2) Hmm, nasty, this would mean: a branch created outside the doc's
tree needs to be looked up for xml:id attributes if such a branch
is attached to the doc's tree. Is this correct? Plus, the other
way round: remove all IDs if a branch is detached from the doc's
tree. But I guess you have that already in mind.
I would say a branch created outside of a doc's tree would not deal with
IDs. If there is no document, there is no ID (In reality I dont think
branches should be created at all without a document - maybe a node.
SetTreeDoc or a node adoption function would need to handle settings the
IDness. This would also fix the ID table (remove them) for a doc whose
branching is being adpoted as well. Yes, nasty. Removing a branch would
not necessarily remove IDness. Unless the branch is being removed from
the tree and its document set to NULL, I don't think it is necessary to
remove IDness (unless the doc was normalized).
What would become with attributes previously being IDs based on a
DTD, if we detach them from the doc's tree and attach them back again?
Would such attrs loose their IDness? I think yes; this sounds saner than
trying to preserve an atype == XML_ATTRIBUTE_ID on detached attributes,
and then add them blindly as IDs at places where they potentially are
no IDs according to a DTD.
Yes they should lose the IDness. The only case where they probably
should not lose it if the attribute is replacing another existing
matching attribute within the same function call. (i.e. xmlSetProp).
Adding one back in for other cases should probably only create IDness in
the case of xml:id.
So maybe:
1) if an ID-attr is detached, the ID is removed from the doc's list of
IDs and the attr looses any trace of IDness.
2) if an attr is added to the doc's tree, then it can become an ID, if:
a) it is an xml:id
b) it is make an ID explicitely via the API (this means adjust
attr->atype and call xmlAddID())
3) attrs which were IDs based on a DTD, can become IDs again
if the doc is re-validated. By the way, this would reflect DOM's way.
This is exactly how I was thinking it should probably work assuming for
1 you mean an attribute is directly detached from its parent element and
not a branch containing the attribute.
Rob
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]