Re: [xml] encoding support with xmlParseInNodeContext



On Thu, Feb 4, 2010 at 12:53 AM, Daniel Veillard <veillard redhat com> wrote:
On Wed, Feb 03, 2010 at 08:34:09PM -0800, Aaron Patterson wrote:
I can't seem to pass an encoding to xmlParseInNodeContext.  This is
problematic when dealing with UTF-8 HTML documents.  I can tell
libxml2 what encoding to use when originally parsing the document, but
it looks like that is completely ignored when using
xmlParseInNodeContext.  Reference nodes in HTML documents completely
ignore the original document encoding and use ISO-8859-1.

Here is a sample program to illustrate the problem:

http://pastie.org/808860

I tried putting together a patch, and it didn't seem to work:

http://pastie.org/808862

Ideally, I would like a function similar to xmlParseInNodeContext, but
one that takes an encoding as a parameter.  Thanks!

 Rather than add Yet Another Entry Point, I think the most logical
is to parse using the encoding from the document, since it's an "in
context" parsing, i.e. parsing as if the fragment was coming from that
document. The encoding switch is a bit harder than what you hoped for,
but it's not that hard, the patch enclosed seems to do it for me, please
have a try.

Perfect.  It works great for me!  Thank you very much!

Any suggestions for workarounds to older versions of libxml2?  I'm
tempted to copy this function to my C code, but I'd rather not if
possible.

-- 
Aaron Patterson
http://tenderlovemaking.com/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]