RE: [xml] Encoding problems building document from scratch




-----Original Message-----
From: Daniel Veillard [mailto:veillard redhat com]
Sent: Tuesday, January 15, 2002 3:44 PM
To: Henke, Markus
Cc: 'xml gnome org'
Subject: Re: [xml] Encoding problems building document from scratch


On Tue, Jan 15, 2002 at 01:10:33PM +0100, Henke, Markus wrote:
When i'm parsing a document that has <...encoding="iso-8859-1">
(or any other (registered) encoding),
libxml will handle the charset conversion and build an
internal representation that is encoded in UTF-8
(and that's pretty nice and preventing... 8)
Therefore it uses the default encoding support or a
(application defined) encoding handler. The raw data are an
(application defined) character buffer and the encoding
information ("iso-8859-1") that is hold in the xmlDoc node.

  yes

there is also a performance
issue when delegating charset conversions to libxml2.
(Performance seems OK, at least i havn't read any complaints 8)
Or have i got things completely wrong?

  No, sounds right

Whew, i'm relieved to hear that... 8]

So, is it abjectly to think about if there's
already a way (or if it's usefull to have one)

  it can be done with existing interfaces.

Fine, i'll rummage in the reference again...

to handle the above mentioned scenario in a efficient way

  it cannot be done in an efficient way. Look at the iconv interface.
Charset converters can have state. Reusing them between 2 independant
call like you suggest would not be possible, hence 
potentially requiring
to open/close a new converter for each API call. And that's a serious
performance problem.

Hum, i'm not shure if i understand that right.
You're talking about subsequent calls to our "virtual"
xmlNewDocNodeEnc(), and each call has to initialize/destroy the
appropriate encoding handler?
Well, i can imagine that this could be relevant in terms of
performance (in general).
BTW: Is the libxml default conversion of iso-latin-1 statefull?

  Check the iconv API,

Daniel


Thanx & Ciao, Markus



Mit freundlichen Gruessen - Kind regards
Markus Henke



________________________Addressed by:________________________
 ORDAT GmbH & Co. KG  -  Serversystems / eCom 
 Dipl.-Inf. (FH) Markus Henke  Fon: +49 (641) 7941-0
 Rathenaustr. 1                Fax: +49 (641) 7941-132
 35394 Gießen                  mailto:markus henke ordat com
 See:                          http://www.ordat.com
_____________________________________________________________
              ...this behavior is by design...



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]