Re: [xml] Encoding problems building document from scratch
- From: Daniel Veillard <veillard redhat com>
- To: "Henke, Markus" <Markus_Henke ordat com>
- Cc: "'xml gnome org'" <xml gnome org>
- Subject: Re: [xml] Encoding problems building document from scratch
- Date: Tue, 15 Jan 2002 09:44:14 -0500
On Tue, Jan 15, 2002 at 01:10:33PM +0100, Henke, Markus wrote:
When i'm parsing a document that has <...encoding="iso-8859-1">
(or any other (registered) encoding),
libxml will handle the charset conversion and build an
internal representation that is encoded in UTF-8
(and that's pretty nice and preventing... 8)
Therefore it uses the default encoding support or a
(application defined) encoding handler. The raw data are an
(application defined) character buffer and the encoding
information ("iso-8859-1") that is hold in the xmlDoc node.
yes
there is also a performance
issue when delegating charset conversions to libxml2.
(Performance seems OK, at least i havn't read any complaints 8)
Or have i got things completely wrong?
No, sounds right
So, is it abjectly to think about if there's
already a way (or if it's usefull to have one)
it can be done with existing interfaces.
to handle the above mentioned scenario in a efficient way
it cannot be done in an efficient way. Look at the iconv interface.
Charset converters can have state. Reusing them between 2 independant
call like you suggest would not be possible, hence potentially requiring
to open/close a new converter for each API call. And that's a serious
performance problem.
Check the iconv API,
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]