Re: [xml] libxml2 2.7.1 breaks XML serialisation of HTML trees



On 25/09/2008, Daniel Veillard <veillard redhat com> wrote:

  Stephan, Martin,

 could you check the enclosed patch ? I'm commiting it to SVN head too
 but it's probably easier to review that way.

I built trunk and had a play around, this handles my use case, thanks!

A couple of concerns, first it's not *totally* clear which options
should be used together. Should at least be documented, and maybe
clearly defined in the code as well.

The patch has a bunch of additions like this:
         xmlSaveCtxtInit(&ctxt);
    +    ctxt.options |= XML_SAVE_AS_XML;
Move that common case into xmlSaveCtxtInit and overwrite after in
xmlNewSaveCtxt for the exception of wanting to save as html?
Or put logic in xmlNewSaveCtxt to make the set of options on the ctxt sane?
Or could the format parameter in those functions be safely upgraded to
full options - the signature is the same and XML_SAVE_FORMAT is 1
anyway, but I guess if people have been passing some other non-zero
value for formatting that'd break compatibility.

Finally, having XML_SAVE_AS_HTML makes it seem like you could save any
xml-flavoured-html document in non-xml-flavoured form, but that's not
quite the case. One thing I found was that the overloading of
XML_CDATA_SECTION_NODE to also be HTML_PRESERVE_NODE means the
contents get output raw, see HTMLtree.c lines 838-843 in
htmlNodeDumpFormatOutput.

 Basically it adds 3 parsing options, and for the old entry points
 xmlDump* not xmlSave based it forces the XML_SAVE_AS_XML bypassing
 the doc type in case of HTML documents. that should fix Stephan problem
 and also provide ways to do things with xmlSave when available.
 For the 'problem' of the added meta an XML_SAVE_IMMUTABLE option could
 be added that sounds more generic, but i'm not adding this in the patch
 to not complicate things.

A don't-fiddle-with-the-tree option sounds like a possibility, though
I do find the duplication of xml:lang to lang useful.

 I hope i didn't miss any old entry point which behaviour was modified in
 2.7.1, and not missing places where the new flags should be checked too,

This I haven't thoroughly tested, however.

Thanks for working on this,

Martin



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]