Re: [xml] Re: XML libs (was Re: gconf backend)

On Sun, Sep 28, 2003 at 05:13:37PM -0400, Havoc Pennington wrote:
> Yes expat/libxml2 will handle more corner cases properly than gmarkup
> _implementation_, and thus the subset will be strict rather than a
> little fuzzy around the edges. That's good and I would prefer that.
> But there's no advantage at all to the larger _API_ in expat/libxml2
> *from the standpoint of an app using XML in this way*, vs. the gmarkup
> _API_.

  Yes there is a difference, the fact that libxml2 and expat actually
fully understand teh XML spec means that they will act appropriately 
when parsing DTDs or when processing the HTML and generating the
data to the user space. There is a bunch of thing done under the hood
that those parser do because they fully implement XML that gmarkup
doesn't do, leading to divergence. Again I'm pretty sure I can
build well-formed documents for which the information set provided
will contain only element and attributes, but gmarkup will provide
*different* informations from what a correct parser would provide.
You can try to diminish that as "corner case" it's non-compliant
misleading behaviour.

> I think we probably agree on this point and already said it a few times,
> but I just want to sum up. 

  No we do not agree. Even on that subset the behaviour can be different.
We take a sequence of byte in input, get events on output or error.
I tell you I will find differences. Of course you could try to expand the
scope of gmarkup to have less broken cases, but ultimately the only
way to get a garantee and test it is to have a full parser and 
run through the standard regression tests.

> Where we maybe don't agree is that I think this makes it perfectly
> reasonable to have a gmarkup-like API in a library with libxml2 or expat
> as the backend. In other words, an XML-subset API is a legitimate thing
> just as much as an XML-subset application is.

  If you have a compliant library you expose the full set of informations
as much as possible. Now if the application want to filter them, fine
it's the application business. There is no use for a library to become
non compliant. At the end of the day, my fridge is an XML application too
it takes any XML input, it just doesn't do anything with it. Where is
the problem.

> A library API can be designed to support a particular application of XML
> and conceal some of the complexity of the XML specs.
> I don't see why the conceal-XML-complexity/specialize-the-API step
> always has to be in the app. I agree it may not be right in libxml2, but
> I don't think it has to be in the app either.

  Make a glue layer on top of the parser, which is exactly what SAX 
invites you to do with callback and could be trivially done on top of
the reader. But as I said before, your subset is not another person subset,
usually nowadays people do put namespaces in their data to ease automatic
processing, vocabulary versioning, and merge of syntax. Heck you subset
would not be able to handle RDF even in its simplistic form.
  I see no point myself to put such a limitation in a library whose
goal is to be reused. Sure you know only a subset of XML so you
use that subset, maybe in 6 month you will have read a good book on XML
(or one of your colleagues, or the poor guy maintaining your code in 5 years)
and bingo your API is too limited to process the data, you're stuck and
you have to version the API...

  No really this doesn't make much sense to me,


Daniel Veillard      | Red Hat Network
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]