Re: [xml] EXI



On Sun, Dec 23, 2007 at 06:08:44PM +0100, Bjorn Reese wrote:
Now that the first working draft of EXI [1] has been released, I was
wondering how people (especially Daniel) feel about adding it to libxml.
Read the primer for a quick overview of EXI.

  Hi Bjorn,

I knew the question would land one the list at some point, and that I
would have a hard time to answer :-)

By and large I think that EXI is a good format, where they have managed
to address many different concerns with a simple design. The major

Well my main issue is to make clear that EXI IS NOT XML, as a member
of W3C XML WG I have gone on record about the worries we have we EXI.
They should call it EBI for something like effcicient binary interface
but it's not XML, it's not markup, it's not text.
I also have a huge issue with the 'pluggable codec' part:
  http://lists.w3.org/Archives/Public/public-xml-core-wg/2007Aug/0028.html
which sounds a lot like polluting a perfectly open to all standard with
the same kind of problems I'm seeing on my daily use of the Web (for example
none of the Flash infected web sites work correcly on my browser, as the
vendor of the propietary solution didn't dare to provide support for 
my 64bits platform). To be perfectly frank I really don't want the next
generation of web platform implementors to jump on the easy excuse of
potential restriction in edge case to control and put a toll on my use
of the web, this sounds just too much of a common thinking nowadays
(see also the crap about DRM).
I really hope the W3C membership, or ultimately Tim Berners-Lee will block
something like pluggable codecs, this simply doesn't have its place on
something like a W3C specification (c.f. the motto about the full potential
of the Web).

Now that I have expresed  my concerns about the content of the spec we can
look spearately about any libxml2 implementation. I have a few more concerns
there:
 - those are first working draft specifications, I know how long it takes
   to finish such spec when there is no controversy about them, for something
   like EXI it may take a couple of years before you get a finished version
   (if any), and being an early implementor usually brings you just more pain
   e.g XPointer where I implemented the full early spec and only a tiny, 
   near useless fraction ended up as a REC.
 - who would use it ? I mean EXI target very specialized domain spaces
   like embedded or specific processing, would those people actually use
   a libxml2 version where the point is more genericity of usage and
   the size and portability designs of the library probably don't match
   the specific requirements of those use cases.
 An implementation just for the sake of being able to claim existence of
a widely distributed early implementor doesn't sound to me a good reason 
to put EXI in libxml2.

So now that I'm done expressing my doubts about it, let's see the technical
points which an implementation in libxml2 would raise :-)

issues that I came across are:

First, the EXI implementation should be an independent parser (and
generator) front-end. It should emmanate SAX events, so that we can
seamlessly use SAX, DOM, and xmlReader for EXI documents. Hopefully
this will also allow us to use all the other XML technologies (XPath,
XML Schema, XSLT, etc.) that libxml supports. I do not know the details
of libxml well enough to evaluate if this is indeed the case.

  Yes like for the HTML parser the right thing to do is to plug at the
SAX level to allow a flow of event, possibly connecting to tree and
reader APIs. Note also that a read-only interface is really not sufficient
you want to be able to save, if you can't round-trip it's really of
limited use or indication of a serious problem.

Second, EXI has some built-in datatypes that are like the XML Schema
datatypes. Obviously, some code should be reused here.

  Assuming the types are really compatible, yes. I just find crazy 
to mix layers like this, but again it's a spec concern, less of an implemtnation
one ... except for the fact that if you compile Schemas support in the 
library size grows a lot.

Third, EXI supports a schema-informed grammar, which means that it can
use information found in XML Schemas, RELAX NG schemas, or DTDs to
create a more compact EXI document. Although the schema-informed grammar
is independent on the various schemas (XML Schema, RELAX NG, DTD), it
eventually has to be populated by those schemas, so it will create some
kind of dependency to these parts of libxml.

  yes. Also note that the validations parts of libxml2 and espcially
the regexp/automata support is really built for validation far less for
introspection, this may present a challenge (but I'm not sure).

Fourth, EXI allows (but does not mandate) the support of user-defined
CODECS for encoding and decoding contents. As this is optional, I have
not looked further into that, but obviously it should be considered if
and how this should be supported by libxml.

  That I have a big grief against as previously explained, it's probably
too early to look at this from a technical viewpoint as I think it will
take time to settle down from a standard/political one ;-)

  I hope I don't sound too negative, but I have a hard time to be convinced
by EXI myself. On the other hand libxml2 development should be user demand
driven, and to some extend my participation in the XML Core group itself
is as representative of the libxml2 community. So if others could voice in
it would be a good idea. Also we have IMHO plenty of time, it's not like
EXI is about to become a REC, this is just a first draft with all the 
associated uncertainties about its content or schedule.

  Thanks Bjorn for raising the issue, even if this may not be a very
simple one :-)

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]