Re: [xml] Encoding problems building document from scratch

From: Daniel Veillard <veillard redhat com>
To: "Henke, Markus" <Markus_Henke ordat com>
Cc: "'xml gnome org'" <xml gnome org>
Subject: Re: [xml] Encoding problems building document from scratch
Date: Tue, 15 Jan 2002 09:44:14 -0500

On Tue, Jan 15, 2002 at 01:10:33PM +0100, Henke, Markus wrote:

When i'm parsing a document that has <...encoding="iso-8859-1">
(or any other (registered) encoding),
libxml will handle the charset conversion and build an
internal representation that is encoded in UTF-8
(and that's pretty nice and preventing... 8)
Therefore it uses the default encoding support or a
(application defined) encoding handler. The raw data are an
(application defined) character buffer and the encoding
information ("iso-8859-1") that is hold in the xmlDoc node.

yes

there is also a performance
issue when delegating charset conversions to libxml2.

(Performance seems OK, at least i havn't read any complaints 8)
Or have i got things completely wrong?


  No, sounds right

So, is it abjectly to think about if there's
already a way (or if it's usefull to have one)


  it can be done with existing interfaces.

to handle the above mentioned scenario in a efficient way


  it cannot be done in an efficient way. Look at the iconv interface.
Charset converters can have state. Reusing them between 2 independant
call like you suggest would not be possible, hence potentially requiring
to open/close a new converter for each API call. And that's a serious
performance problem.

  Check the iconv API,

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

References:
- RE: [xml] Encoding problems building document from scratch
  - From: Henke, Markus

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]