[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [xml] libxml2 2.7.1 breaks XML serialisation of HTML trees



Hi,

Daniel Veillard wrote:
> On Mon, Sep 08, 2008 at 03:01:29PM +0200, Stefan Behnel wrote:
>> I now wonder why there are two serialisation methods (xmlNodeDump* and
>> htmlNodeDump*) that ultimately do the same thing, instead of serialising
>> to what they are named after.
> 
>   Well the goal is more to get people to use xmlSave* than the old
> xmlNodeDump and htmlNodeDump ones.

lxml uses those two because they (used to) produce the same output across
libxml2 versions. We do most of the output around the actual tree
serialisation by hand (e.g. doctype and XML decl), as there isn't an API that
generates reproducible output across libxml2 versions (we currently support
libxml2 2.6.20 and later). xmlSave*() is particularly bad in that regard, as
the early versions lack a lot of important options, so getting predictable
output across versions is extremely cumbersome.

One of the problems we face is that we try to be compatible with the
ElementTree library as far as possible, so if you do the same operations on
the same input, the output SHOULD look the same, too.


> Options are set at contect creation,
> we can add more options and trying to keep the old functions to support the
> same would require way too many entry points.

I agree. Making xmlSave* more usable is perfectly fine with me. However,
breaking functions that do very specific parts of the work is a pretty
negative side-effect.


>> If the current behaviour is wanted, what's the future way of achieving
>> this *without* temporarily modifying the document? (i.e. without breaking
>> thread concurrency)
> 
>   Hum, sorry, clearly an oversight, I wanted to make xmlsave routines
> HTML aware, which in itself  sounds a good idea, no ?

Absolutely.


> I guess we can use an xmlSave option to force the output to use the
> HTML parser or the XML one and then make sure xmlNodeDump* and
> htmlNodeDump* use them appropriately.

That would fix it, yes. In any case, they should do what their name implies,
without being smarter than necessary.


> Sorry for the breakage, I forgot the old xmlSave* had been remapped to
> the new ones.

That's ok, libxml2 2.7 is young, that happens. lxml's history isn't free of
mistakes either.

Stefan


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]