Re: [xml] Release of libxml2-2.7.1



Daniel Veillard wrote:
On Thu, Sep 04, 2008 at 06:16:15PM +0100, Colin Guthrie wrote:
Rob Richards wrote:
Colin Guthrie wrote:
Hi Daniel,

Daniel Veillard wrote:
  Python serialization code was broken in 2.7.0 so here is a new release
with a cleanup of that code, even more isolation of the new buffer type
from user code and a couple of fixes:

* Portability fix:
 - Borland C fix (Moritz Both)
* Bug fixes:
 - python serialization wrappers
 - XPath QName corner case handking and leaks (Martin)
* Improvement:
 - extend the xmlSave to handle HTML documents and trees
* Cleanup:
 - python serialization wrappers

  I hope that one is a good one !
Not sure if this is the right avenue to report this bug but I'm having some fairly serious regressions.

I've not tested 2.7.0, but 2.7.1 is definitely affected.

I noticed the problem in PHP parsing of XML, and have submitted a bugreport and a test case to the following bug:

https://qa.mandriva.com/show_bug.cgi?id=43486

Essentially, when using PHPs older parsing functions (which I thought were built on expat rather than libxml2 but it seems not), escaped entities in cdata are completely ignored. i.e. > < etc.

See the attachment on the above bug for a test case which requires PHP to be installed.

Hopefully someone can shed some light on the situation.

Can you report this as a PHP bug? It looks like some really old hack code in the PHP extension in order to mimic some specific expat functionality. The behavior change you see though resulting from a code changes in libxml2 is really due to the hackish code in the extension doing things it wasnt meant to be doing. You're better off using the xmlreader extension in PHP in any case as its simpler, faster, more powerful and doesn't have any legacy issues like the old xml extension.
Thanks for the info Rob.

I'll report this to the PHP people.

I'm well aware there are better PHP extensions for XML processing, but sadly I'm maintaining some old code that I don't really want to rip apart unless I have to!

  The only thing I can think of is that libxml2 doesn't anymore ask
though a SAX callback when looking for entities references if they
are in the predefined set. This comes in essence by an old decision
from the XML working group stating that user definition for those 5
entities could not override the default predefined ones. So I guess
that change is logical. Now what is done on top of SAX to result
in that bug, I don't really know :-\


The short story is that in the mimicking of the old behavior when the extension used expat, entities are not replaced and no warnings are desired when its external and no defined. A hack was used in the extension where wellFormed is set to 0 when the context is created. Then when the getEntity callback is called, the extension is handling the character output itself - only the entity reference is output, not its content. Once done, because the document is not well formed (supposedly), nothing is else is done with the entity.

Now that the pre-defined entities are not passed to the callback, they are no longer handled. Not modifying the flag wellFormed flag results in the pre-defined entities working properly, but causes the entity to be parsed which in turns kicks off all callbacks causing the content of the entity to be pushed through all the extensions callbacks. I've been currently looking at both trying to work around the change while keeping the hack in place as well as exploring completely re-writing the entity handling, but not sure if either of those solutions will work. So basically the extension was using voodoo code to get the entities to work as it wanted them to and it has finally caught up with it.

Rob




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]