Re: [xml] libxml2 2.7.1 breaks XML serialisation of HTML trees



Hi,

Daniel Veillard wrote:
On Mon, Sep 08, 2008 at 03:01:29PM +0200, Stefan Behnel wrote:
I now wonder why there are two serialisation methods (xmlNodeDump* and
htmlNodeDump*) that ultimately do the same thing, instead of serialising
to what they are named after.

  Well the goal is more to get people to use xmlSave* than the old
xmlNodeDump and htmlNodeDump ones.

lxml uses those two because they (used to) produce the same output across
libxml2 versions. We do most of the output around the actual tree
serialisation by hand (e.g. doctype and XML decl), as there isn't an API that
generates reproducible output across libxml2 versions (we currently support
libxml2 2.6.20 and later). xmlSave*() is particularly bad in that regard, as
the early versions lack a lot of important options, so getting predictable
output across versions is extremely cumbersome.

One of the problems we face is that we try to be compatible with the
ElementTree library as far as possible, so if you do the same operations on
the same input, the output SHOULD look the same, too.


Options are set at contect creation,
we can add more options and trying to keep the old functions to support the
same would require way too many entry points.

I agree. Making xmlSave* more usable is perfectly fine with me. However,
breaking functions that do very specific parts of the work is a pretty
negative side-effect.


If the current behaviour is wanted, what's the future way of achieving
this *without* temporarily modifying the document? (i.e. without breaking
thread concurrency)

  Hum, sorry, clearly an oversight, I wanted to make xmlsave routines
HTML aware, which in itself  sounds a good idea, no ?

Absolutely.


I guess we can use an xmlSave option to force the output to use the
HTML parser or the XML one and then make sure xmlNodeDump* and
htmlNodeDump* use them appropriately.

That would fix it, yes. In any case, they should do what their name implies,
without being smarter than necessary.


Sorry for the breakage, I forgot the old xmlSave* had been remapped to
the new ones.

That's ok, libxml2 2.7 is young, that happens. lxml's history isn't free of
mistakes either.

Stefan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]