Re: [xml] Important: possible incompatible changes ahead for 2.9.0 !



Hi Daniel,

thanks for the heads-up. I don't care all that much about the global dict
size - 10M entries should be hard enough to reach for normal use cases.
Most users only deal with a very small number of XML formats.

But I did run into issues with the buffer changes.

Daniel Veillard, 06.08.2012 09:00:
  The new buffer structure will be ABI compatible with the old ones,
i.e. the old code as compiled wil be able to work with the new one, as
the fields with the same values are in the same place in the new
structures. But the structure are now opaque and the few places where
the code was using it directly will need fixing.
  What I see from the usage there are for example access to xmlOutputBuffers:

  buf = xmlAllocOutputBuffer (NULL);
  ....dump stuff to the buffer...
  use data at buf->buffer->content, of size buf->buffer->use

First okay, that was allowed by the API, but such buffers were supposed
to be used for I/O and encoding conversion, in general accessing
buf->buffer->content and buf->buffer->use directly was not really the
expected way to do things. The fact that xmlOutputBuffer were not
supposed to be used that way is the reason why there is no accessors for
getting the output data, this is now fixed as of commit

  http://git.gnome.org/browse/libxml2/commit/?id=e258adecd0e19a6cfe6afa232b89aa416368820e

 So where there is such use of direct access, check the LIBXML2_NEW_BUFFER
macro and if present then
   - replace buf->buffer->content with xmlOutputBufferGetContent(buf)
   - replace buf->buffer->use with xmlOutputBufferGetSize(buf)

I tested it and found that lxml is affected by this. lxml currently takes
the xmlBuffer* from either the "conv" or "buffer" field of the output
buffer and then calls xmlBufferContent() and xmlBufferLength() to get at
the result. I take it that this isn't how it'll work in the future, because
xmlBufferLength() returns an int and buffers are supposed to be larger than
that, right?

However, xmlOutputBufferGetContent() only reads the "buffer" field, not the
"conv" field. How should I use the "conv" field now? Can't the new
xmlOutputBufferGetContent() do "the right thing" for me?

Code that uses xmlBuffer directly is here:

https://github.com/lxml/lxml/blob/master/src/lxml/serializer.pxi#L31

https://github.com/lxml/lxml/blob/master/src/lxml/serializer.pxi#L123

Another issue I found: xmlDumpNotationTable() still wants an xmlBuffer
instead of the xmlBuf that outbuffer.buffer returns. Is the right fix here
to include buf.h and call xmlBufBackToBuffer()?

https://github.com/lxml/lxml/blob/master/src/lxml/serializer.pxi#L293

(BTW, the reason why the serialisation code is doing so much stuff manually
is IIRC that lxml still supports a couple of libxml2 versions that lack the
newer features of the serialisation/xmlSave API. And also to avoid slight
changes to the serialised XML if it switched to native libxml2 functions
abruptly.)


  if in some place the xmlBufferPtr was passed independantly of the
OutputBuffer, it's possible to use xmlBufGetContent(buffer) and
xmlBufUse(buffer) to achieve the same.

I assume you meant xmlBufContent() ?

It seems to me that redefining xmlBufferLength and xmlBufferContent to call
the new xmlBuf functions and using a size_t (or ssize_t?) to store the
result of xmlBufLength would do the trick.

BTW, is there a reason why there's both an xmlBufLength() and an
xmlBufUse() that do the same thing? Since this is a new API that doesn't
suffer from legacy junk yet, wouldn't one be enough? (And wouldn't
xmlBufLength() be the perfect name?)


  I don't plan to make an official release with the changes before
September, so there is a bit of time to get this all cleaned up, and
possibly refine the migration stategy for the few apps affected.

There'll be a new release (3.0) of lxml quite soon, within a few weeks. It
should be doable to get this fixed up by then.

Stefan




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]