On Fri, Nov 03, 2006 at 02:32:28PM +1100, Michael Day wrote:

The htmlCtxtReset() function contains the following line:

     ctxt->charset = XML_CHAR_ENCODING_UTF8;

However, htmlNewParserCtxt() and htmlInitParserCtxt() do not do this, 
they leave charset set to zero.

This means that causing htmlCtxtReset() changes the behaviour of 
htmlCtxtReadFile() compared to using a fresh parsing context.

Is this a bug? It certainly seems a bit awkward.

  ctxt->charset is a remain from libxml1 where strings were stored
in the document encoding (this was a complete and total mess), now
they are always stored as UTF-8 so whether the value is 0 or 
XML_CHAR_ENCODING_UTF8 this should not change anything, really.
  The encoding of the current input is stored as a string in
ctxt->input->encoding (when provided).


