[xml] Refactoring of the SAX interface for namespace support.
- From: Daniel Veillard <veillard redhat com>
- To: xml gnome org
- Subject: [xml] Refactoring of the SAX interface for namespace support.
- Date: Tue, 12 Aug 2003 11:05:50 -0400
The story of libxml2 SAX interface is a bit complex. I initally
didn't offered a SAX API just a tree builder, then for compatibility
with expat I separated the core of the parsing from tree building
using the interface that expat exported (more or less, there is a few
differences). As new support was added for namespaces and validation
the SAX API didn't changed, as a result most of the work has been done
on the tree building (the SAX.c module actually) code. I think it's
time to extend the SAX API provided by libxml2 to expose namespace
properties. This should also allow to minimize some of the string
copy, concatenation and then splitting used to go between various
mapping used internally.
The Java SAX has been extended to handle namespace. I did look at the
new API provided by expat :
XML_Parser
XML_ParserCreateNS(const XML_Char *encoding,
XML_Char sep);
"Constructs a new parser that has namespace processing in effect. Namespace
expanded element names and attribute names are returned as a concatenation
of the namespace URI, sep, and the local part of the name. This means that
you should pick a character for sep that can't be part of a legal URI."
I really don't think this is an appropriate API for libxml2, it looses
the prefix values, which is needed to serialize back, it forces the client
code to handle very long names and split them up, I think the attempt to
keep the interfaces similar from a signature but not a semantic point of view
is not a good design.
Some of the requirements I would like to see for a new SAX API in libxml2
are as follow:
- keep API and ABI, i.e. existing code must continue to work
- try to minimize the number of new function introduced
- the new ABI will provide directly:
+ the prefix
+ the namespace
+ the local name
for element start tag and end tag as well as for attributes.
Techically I think some of the following should work:
- extend the _xmlSAXHandler structure. However ther is some risk
associated with this, the code will have to do a check against
the version of the library used at run-time
- provide new startElementNs() and endElementNs() callbacks
The signature would be:
void startElementNs(
void *ctx,
const xmlChar *localname, //local element name
const xmlChar **atts, //pairs of (local attribute name/value)
const xmlChar *prefix,
const xmlChar *URL,
const xmlChar **attsNs //pairs of attributes (prefix/URL)
)
and
void endElementNs(void *ctx,
const xmlChar *localname, //local element name
const xmlChar *prefix,
const xmlChar *URL
)
Note that an API similar to Expat NsSAX seems very easy to build
on top of it...
Alternatively I'm thinking about splitting namespace and attribute
callbacks, so that more information known by the parser can be passed
up to the client code, in that case atts and attsNs in startElementNs
disapears and 2 new callback type are provided and called just after the
startElement
void namespace(
void *ctx,
const xmlChar *prefix,
const xmlChar *URL
)
void attributeNs(
void *ctx,
const xmlChar *localname, //local attribute name
const xmlChar *prefix,
const xmlChar *URL,
const xmlChar *value,
)
there is one thing to note, that a namespace() callback may actually
provide the namepace binding for the element after startElementNs()
was called like in <foo:bar xmlns:foo="bar"/>
there is another option even more disturbing from an API viewpoint:
change name to simple const xmlChar * zero terminated to
const xmlChar * with a lenght in bytes, like for the character
callbacks.
goal would be to minimize the number of string copies needed, this could
be very effective for attribute values which operates on a non-bounded
vocabulary. Minimizing the number of string allocated for tags can
be done very easilly by the parser since the values pertains to a fixed
vocabulary this is part of the enhancements I have long planned to do
in libxml2.
At this point this is an open debate, my proposal is on the table for
discussion, so feedback welcome, reshape it, flame it, I may be nuts but
the collective intellignece is supposed to fix this! I didn't do any
implementation yet, so there is no damage in taking a direction or another,
express yourself or be ready to suffer in silence if a wrong API is
designed and implemented :-)
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]