Re: [xml] How to reset an HTML push parser context?

On Tue, Sep 11, 2007 at 01:26:30PM +0200, Stefan Behnel wrote:

Daniel Veillard wrote:
On Mon, Sep 10, 2007 at 09:45:10AM +0200, Stefan Behnel wrote:

there isn't currently an API function for resetting a push parser context for
the HTML parser. However, resetting it for reuse doesn't seem to be trivial.
It looks like I have to run htmlCtxtReset() and then create and set up an
input stream (in a pretty ugly way, according to the Create code...). This
could well motivate an official function.

I also thought about using the xmlCtxtResetPush function, but then I stumble
over things like the spaceTab setup (which is currently a sure crasher for me).

Is there anything else I have to do to implement this functionality by hand?
And: is there an easier way?

  Honnestly I don't know. I don't see why xmlCtxtResetPush() would not
work for an html parser context.

In case others are interested, the code below works for me (Pyrex code, but
should be readable).


cdef int _htmlCtxtResetPush(xmlparser.xmlParserCtxt* c_ctxt,
                            char* c_data, int buffer_len,
                            char* c_encoding, int parse_options) except -1:
    # libxml2 crashes if spaceTab is not initialised
    if _LIBXML_VERSION_INT < 20629 and c_ctxt.spaceTab is NULL:
        c_ctxt.spaceTab = <int*>tree.xmlMalloc(10 * sizeof(int))
        if c_ctxt.spaceTab is NULL:
        c_ctxt.spaceMax = 10

  xmlCtxtResetPush should instead be fixed to cope with that condition.

    # libxml2 lacks an HTML push parser setup function
    error = xmlparser.xmlCtxtResetPush(c_ctxt, NULL, 0, NULL, c_encoding)
    if error:
        return error

    # fix libxml2 setup for HTML = 1
    c_ctxt.html = 1
    htmlparser.htmlCtxtUseOptions(c_ctxt, parse_options)

    if c_data is not NULL and buffer_len > 0:
        return htmlparser.htmlParseChunk(c_ctxt, c_data, buffer_len, 0)

  If you think a real C function htmlCtxtResetPush might be useful, then
as usual I take patches ! :-)


Red Hat Virtualization group
Daniel Veillard      | virtualization library
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]