Re: [xml] replaceEntities replacing entities twice



On Wed, Jul 26, 2006 at 10:57:22PM -0400, Eric Seidel wrote:
I wanted to follow up with this bug:

http://bugzilla.gnome.org/show_bug.cgi?id=159219

WebKit was also running into this bug.  We inherited some work-around  
code from KDOM:

static xmlEntityPtr getEntityHandler(void *closure, const xmlChar *name)
{
    xmlParserCtxtPtr ctxt = static_cast<xmlParserCtxtPtr>(closure);
    xmlEntityPtr ent = xmlGetPredefinedEntity(name);
    if (ent)
        return ent;

    ent = xmlGetDocEntity(ctxt->myDoc, name);
    if (!ent && getTokenizer(closure)->isXHTMLDocument())
        ent = getXHTMLEntity(name);

    // Work around a libxml SAX2 bug that causes charactersHandler  
to be called twice.
    if (ent)
        ctxt->replaceEntities = (ctxt->instate ==  
XML_PARSER_ATTRIBUTE_VALUE) || (ent->etype !=  
XML_INTERNAL_GENERAL_ENTITY);

    return ent;
}

Recently I've noticed that the above work-around code is not quite  
correct and is causing troubles of its own:

  There is a big warning in red on libxml2 doc about entities and SAX:
    http://xmlsoft.org/entities.html
 
libxml2 default sax callback build entities informations associated 
to the document. That documenbt is also contsructed by sax callbacks.
If you replace the sax callbacks you also must construct the document
and associated entities.
The code you pasted can only work if a number of other things are in place
in order to build the entities from the sax callbacks, I really can't guess
what is going on in your framework, and correct entities support can be
extremely hard to implement right. Handling of entities while in the 
internal subset or in the external subset must be different than handling
from content, this can get incredibly complex.
In a nutshell, sorry you got yourself in a very hard place, the xmlReader
is a way cleaner and simpler streaming API which will take care of those
issues. 

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]