RE: [xml] Encoding problems building document from scratch
- From: "Henke, Markus" <Markus_Henke ordat com>
- To: "'veillard redhat com'" <veillard redhat com>
- Cc: "'xml gnome org'" <xml gnome org>
- Subject: RE: [xml] Encoding problems building document from scratch
- Date: Tue, 15 Jan 2002 16:18:10 +0100
-----Original Message-----
From: Daniel Veillard [mailto:veillard redhat com]
Sent: Tuesday, January 15, 2002 3:44 PM
To: Henke, Markus
Cc: 'xml gnome org'
Subject: Re: [xml] Encoding problems building document from scratch
On Tue, Jan 15, 2002 at 01:10:33PM +0100, Henke, Markus wrote:
When i'm parsing a document that has <...encoding="iso-8859-1">
(or any other (registered) encoding),
libxml will handle the charset conversion and build an
internal representation that is encoded in UTF-8
(and that's pretty nice and preventing... 8)
Therefore it uses the default encoding support or a
(application defined) encoding handler. The raw data are an
(application defined) character buffer and the encoding
information ("iso-8859-1") that is hold in the xmlDoc node.
yes
there is also a performance
issue when delegating charset conversions to libxml2.
(Performance seems OK, at least i havn't read any complaints 8)
Or have i got things completely wrong?
No, sounds right
Whew, i'm relieved to hear that... 8]
So, is it abjectly to think about if there's
already a way (or if it's usefull to have one)
it can be done with existing interfaces.
Fine, i'll rummage in the reference again...
to handle the above mentioned scenario in a efficient way
it cannot be done in an efficient way. Look at the iconv interface.
Charset converters can have state. Reusing them between 2 independant
call like you suggest would not be possible, hence
potentially requiring
to open/close a new converter for each API call. And that's a serious
performance problem.
Hum, i'm not shure if i understand that right.
You're talking about subsequent calls to our "virtual"
xmlNewDocNodeEnc(), and each call has to initialize/destroy the
appropriate encoding handler?
Well, i can imagine that this could be relevant in terms of
performance (in general).
BTW: Is the libxml default conversion of iso-latin-1 statefull?
Check the iconv API,
Daniel
Thanx & Ciao, Markus
Mit freundlichen Gruessen - Kind regards
Markus Henke
________________________Addressed by:________________________
ORDAT GmbH & Co. KG - Serversystems / eCom
Dipl.-Inf. (FH) Markus Henke Fon: +49 (641) 7941-0
Rathenaustr. 1 Fax: +49 (641) 7941-132
35394 Gießen mailto:markus henke ordat com
See: http://www.ordat.com
_____________________________________________________________
...this behavior is by design...
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]