Re: [xml] Libxml/Python/Unicode problem
- From: Martijn Faassen <faassen infrae com>
- To: Örjan Reinholdsen <Orjan Reinholdsen smarttrust com>
- Cc: xml gnome org
- Subject: Re: [xml] Libxml/Python/Unicode problem
- Date: Fri, 27 Feb 2004 18:44:11 +0100
Örjan Reinholdsen wrote:
What is going on here? Why is the python-binding trying to encode into ascii at all?
Or is it just me misunderstanding something here...
The libxml Python binding in fact does not accept Python unicode strings.
Libxml internally works with UTF-8 encoding, and its Python API also
expects UTF-8.
The trick here is to convert your strings to UTF-8 before you pass them
into libxml, and to convert them back again from it into unicode
strings when you get them. Perhaps superfluously, you do this
like this:
utf8string = unicodestring.encode('UTF-8')
and back again
unicodestring = unicode(utf8string, 'UTF-8')
I've discussed a global 'knob' for libxml in the past with Daniel that
make the Python binding accept and return unicode, at the extra cost of
conversion in the binding layer (to UTF-8 and back). With this knob
*all* strings that enter the API are considered to be unicode string (as
if python does a 'unicode()' on them). This means that the API would
accept old-style strings in the ascii range as well as unicode strings,
which is the default Python behavior when handling unicode.
It may not be a bad idea to make some progress on this, as Python
programmers trying to do the right thing with unicode in the Python
sense now get blocked by confusion in libxml, even though libxml does
the right thing too in its own terms. :)
Regards,
Martijn
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]