[xml] Supporting additional encodings in a push parser?



I'm looking to improve I18n support in my libxml2-based Apache
filter modules, such as mod_proxy_html.

To work well with the Apache architecture, these 
use a push parser:

  ctxt->parser = htmlCreatePushParserCtxt(ctxt->sax, ctxt,
        buf, m->start, 0, enc );

"enc" is an xmlCharEncoding set by xmlParseCharEncoding or
xmlDetectCharEncoding.  With charset sniffing, this
automatically inherits libxml2's native charset support.

Now, libxml2's encoding module lets me register new charsets,
for example by registering iconv-based conversion functions:

  xmlCharEncodingHandlerPtr charenc
        = xmlNewCharEncodingHandler(encoding, iconv_in, iconv_out);
  xmlRegisterCharEncodingHandler(charenc);

But there's a missing link: xmlCharEncoding is an enum, and
registering a new encoding handler doesn't create a new value I 
can use with xmlParseCharEncoding, htmlCreatePushParserCtxt, etc.

Is there a workaround that'll enable me to register new
charsets *and* use them in a push parser, other than
just preprocessing ahead of the parser?  

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]