On Wed, Feb 03, 2010 at 08:34:09PM -0800, Aaron Patterson wrote:
I can't seem to pass an encoding to xmlParseInNodeContext. This is problematic when dealing with UTF-8 HTML documents. I can tell libxml2 what encoding to use when originally parsing the document, but it looks like that is completely ignored when using xmlParseInNodeContext. Reference nodes in HTML documents completely ignore the original document encoding and use ISO-8859-1. Here is a sample program to illustrate the problem: http://pastie.org/808860 I tried putting together a patch, and it didn't seem to work: http://pastie.org/808862 Ideally, I would like a function similar to xmlParseInNodeContext, but one that takes an encoding as a parameter. Thanks!
Rather than add Yet Another Entry Point, I think the most logical is to parse using the encoding from the document, since it's an "in context" parsing, i.e. parsing as if the fragment was coming from that document. The encoding switch is a bit harder than what you hoped for, but it's not that hard, the patch enclosed seems to do it for me, please have a try. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel veillard com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/
Attachment:
in_context_encoding.patch
Description: Text document