Re: [xml] Encoding problems building document from scratch



On Tue, Jan 15, 2002 at 01:10:33PM +0100, Henke, Markus wrote:
When i'm parsing a document that has <...encoding="iso-8859-1">
(or any other (registered) encoding),
libxml will handle the charset conversion and build an
internal representation that is encoded in UTF-8
(and that's pretty nice and preventing... 8)
Therefore it uses the default encoding support or a
(application defined) encoding handler. The raw data are an
(application defined) character buffer and the encoding
information ("iso-8859-1") that is hold in the xmlDoc node.

  yes

there is also a performance
issue when delegating charset conversions to libxml2.
(Performance seems OK, at least i havn't read any complaints 8)
Or have i got things completely wrong?

  No, sounds right

So, is it abjectly to think about if there's
already a way (or if it's usefull to have one)

  it can be done with existing interfaces.

to handle the above mentioned scenario in a efficient way

  it cannot be done in an efficient way. Look at the iconv interface.
Charset converters can have state. Reusing them between 2 independant
call like you suggest would not be possible, hence potentially requiring
to open/close a new converter for each API call. And that's a serious
performance problem.

  Check the iconv API,

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]