Re: [xml] libxml2 2.7.1 breaks XML serialisation of HTML trees



On Sat, Sep 27, 2008 at 12:58:50AM +0100, Martin (gzlist) wrote:
On 25/09/2008, Daniel Veillard <veillard redhat com> wrote:

  Stephan, Martin,

 could you check the enclosed patch ? I'm commiting it to SVN head too
 but it's probably easier to review that way.

I built trunk and had a play around, this handles my use case, thanks!

A couple of concerns, first it's not *totally* clear which options
should be used together. Should at least be documented, and maybe
clearly defined in the code as well.

The patch has a bunch of additions like this:
         xmlSaveCtxtInit(&ctxt);
    +    ctxt.options |= XML_SAVE_AS_XML;
Move that common case into xmlSaveCtxtInit and overwrite after in
xmlNewSaveCtxt for the exception of wanting to save as html?

  No, the point is that by default you have no options, and to
force the XML_SAVE_AS_XML only on the old xmlDump* functions to
restore their behaviour. xmlSaveCtxtInit() should be kept neutral.

Or put logic in xmlNewSaveCtxt to make the set of options on the ctxt sane?
Or could the format parameter in those functions be safely upgraded to
full options - the signature is the same and XML_SAVE_FORMAT is 1
anyway, but I guess if people have been passing some other non-zero
value for formatting that'd break compatibility.

  I really think the current code is proper it express that those old
entry point force output as XML.

Finally, having XML_SAVE_AS_HTML makes it seem like you could save any
xml-flavoured-html document in non-xml-flavoured form, but that's not
quite the case. One thing I found was that the overloading of
XML_CDATA_SECTION_NODE to also be HTML_PRESERVE_NODE means the
contents get output raw, see HTMLtree.c lines 838-843 in
htmlNodeDumpFormatOutput.

  Okay, maybe that need fixing, yes.

 Basically it adds 3 parsing options, and for the old entry points
 xmlDump* not xmlSave based it forces the XML_SAVE_AS_XML bypassing
 the doc type in case of HTML documents. that should fix Stephan problem
 and also provide ways to do things with xmlSave when available.
 For the 'problem' of the added meta an XML_SAVE_IMMUTABLE option could
 be added that sounds more generic, but i'm not adding this in the patch
 to not complicate things.

A don't-fiddle-with-the-tree option sounds like a possibility, though
I do find the duplication of xml:lang to lang useful.

  I see but the threading aspect of XML_SAVE_IMMUTABLE is an important
point IMHO.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]