Re: [xml] libxml2 and default namespaces



On Tue, 2005-12-13 at 02:17 +0100, Paul Boddie wrote:
Hello!

Don't worry, I don't think this is trivia. I'm happy that this issue
reached the surface once again, since it seems to me still
underestimated by DOM users. I guess the most people using DOM think
that there's no way of how the serialized representation of a DOM tree
might break.

It shocks me to see how complicated the standards make this, yet I feel 
somewhat embarrassed that I didn't know that createElementNS didn't guarantee 
the presence of namespace declarations in a serialised document. Seeing the 
thread on comp.lang.python, I suppose I'm not alone in that respect, however.

Warning: the following is wrong; an LSSerializer (DOM Level 3 Load and
Save module) will normalize namespaces by _default_.

A plain DOM serializer will just close it's eyes and won't try to
change anything what's in the DOM tree. That's fine and wanted
for e.g. editing applications.

If one wants a samantic-safe serialization, then one needs a
namespace-normalization mechanism; although you risk breaking
QNames in element/attribute content on the other hand.

The options here would then be:
1) Close your eyes and serialize the tree
  a) if you know exactly that you didn't create mess in the tree then
     this is OK
  b) be aware that your serialized tree might be broken
2) Normalize namespaces and then serialize
  a) the normalization will try to change prefixes,
     remove/add ns-declarations, in a way that a serialization is
     possible without altering the semantics of the DOM tree
  b) if the DOM is not serializable then the normalization should raise
     an error
  c) be aware that the normalization might break your QNames

If we apply namespace-normalization to your example, then the outcome
would look like:
<href xmlns:ns1='DAV:'/>
i.e. the namespace declaration of 'DAV:' would get a different prefix,
in order to not interfere with the <href> element in no namespace.

But the href element was created with a namespace specified, but with no 
prefix in its qualified name. A subsequent discussion touched upon default 
namespace pollution where href is created as follows...

href = document.createElementNS("DAV:", "href")

...and where a child of href is created as follows...

no_ns = document.createElementNS(None, "no_ns")

...where None is Python's equivalent of JavaScript's null. For this I proposed 
the following serialisation:

<?xml version="1.0"?>
<href xmlns="DAV:">
  <no_ns xmlns=""/>
</href>

Looks OK; I would expect the same result.

Just to be sure we talk about the same: your first example didn't put
the <href> element in the "DAV:" namespace. So you provided here a
different scenario, right?
I.e. ns = libxml2mod.xmlNewNs(element, "DAV:", None) does only
create a ns-declaration attribute on the element, but does not
assign any namespace to the element.

On the one hand I use namespace-normalization for small DOM trees,
where the overhead of a normalization doesn't matter; on the other hand,
I just try to be careful and keep the serialized form in the back of my
head when working on a huge DOM tree, where I want to avoid
ns-normalization.

My objectives include using libxml2's serialisation wherever possible - 
traversing the tree in Python is typically a very slow operation, and having 
to fix up the tree is also likely to incur substantial performance costs. 

Maybe we should implement a namespace-normalization function in Libxml2.
Have a look at xmlDOMWrapReconcileNamespaces (in tree.c); it does
something similar, but not exactly since namespaces are handled
differently in Libxml2 than in DOM; i.e. we cannot simply remove a
ns-declaration, since it could be referenced by node->ns fields. I don't
know anymore if it does the xmlns="" thingy, so you might want to test
this.

Since you're not the first person to suggest namespace normalisation (and the 
related DOM standards), I had a look at the pxdom module for Python which is 
much more standards-compliant than virtually any other Python DOM 
implementation, and it would appear that pxdom does "automagically" (as 
someone said) emit xmlns declarations at least in its default configuration, 
which I would assume has something to do with the normalisation process or 
some related aspect of DOM Level 3.

I now even see that I was wrong telling you that a plain DOM serializer
won't try to normalize namespaces. It will normalize by default
according to:
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-namespaces

"namespaces"
true
        [required] (default)
        Perform the namespace processing as defined in Namespace
        Normalization. 
false
        [optional]
        Do not perform the namespace processing.
        
So that's the reason why pxdom does it automagically.

Anyway, I'd like to thank you for the kind words and helpful advice. It's a 
longer journey of enlightenment than I thought. ;-)

Yeah, obviously the same for me.

Regards,

Kasimier



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]