[xml] Problems with xmlCharEncInFunc()
- From: "Henke, Markus" <Markus_Henke ordat com>
- To: "'xml gnome org'" <xml gnome org>
- Subject: [xml] Problems with xmlCharEncInFunc()
- Date: Thu, 7 Feb 2002 19:09:02 +0100
Hello,
i've tested my self defined character encoding handler for HP-ROMAN8 with
the encoding API of libxml and it works fine so far.
But i've done some kind of "stress" testing near the default xml-buffer size
and unpropitious character combinations and run into problems in
xmlCharEncInFunc().
ROMAN-8 includes some characters that have to be transcoded to three
octets in UTF-8, so a maximum unpropitious ROMAN-8 string needs the
triple space in UTF-8 encoding.
In 'encoding.c' we have the following code to calculate the needed size
for the output buffer
encoding.c: 2056,2063
toconv = in->use;
if (toconv == 0)
return (0);
written = out->size - out->use;
if (toconv * 2 >= written) {
xmlBufferGrow(out, out->size + toconv * 2);
written = out->size - out->use - 1;
}
So, if the double in-buffer size () exceeds the available out-buffer size,
the out-buffer size is increased by the double in-buffer size.
This is sufficient for ISO-8859-1 encoding in any case, and for most
cases it will work for other encodings like HP-ROMAN8. But one can
construct in-buffer where it fails, e.g. any buffer that is greater than
1/3 and smaller than 1/2 of the default buffer size (provided that we
use default out-buffer) and holds characters that are mapped to three
UTF-8 octets. OK, it's constructed, but not impossible!?
I guess a simple (maybe naive) solution would be to change this behavior
to
toconv = in->use;
if (toconv == 0)
return (0);
written = out->size - out->use;
if (toconv * 3 >= written) {
xmlBufferGrow(out, out->size + toconv * 3);
written = out->size - out->use - 1;
}
which should work for any encodings that are covered by Unicode (UCS-2),
but it's possibly a waste of memory?
Maybe it's better to perform a retry for the case that the registered
xmlCharEncodingInputFunc returns -1 (which is the correct semantic for lack
of space if i got it right)?
xmlCharIncFunc() returns -1 in that case, which stays for "generall error",
so i'm not sure if we can do a reliable retry from application level,
i don't know if there's a way at all, since we would need the number of
bytes
that are already consumed.
Mit freundlichen Gruessen - Kind regards
Markus Henke
________________________Addressed by:________________________
ORDAT GmbH & Co. KG - Serversystems / eCom
Dipl.-Inf. (FH) Markus Henke Fon: +49 (641) 7941-0
Rathenaustr. 1 Fax: +49 (641) 7941-132
35394 Gießen mailto:markus henke ordat com
See: http://www.ordat.com
_____________________________________________________________
...this behavior is by design...
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]