RE: [xml] UTF8Toisolat1() usage
- From: "Christopher R. Maden" <crism maden org>
- To: xml gnome org
- Subject: RE: [xml] UTF8Toisolat1() usage
- Date: Wed, 05 Jun 2002 01:08:54 -0700
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
At 01:06 5/6/02, Morus Walter wrote:
A conversion from UTF8 to Latin1 may only shorten the text (down to
the half of the utf8 byte length in extreme cases).
So allocating a buffer of the size of the UTF8 text will be sufficiant.
No - the Latin 1 string may be between 0.5 and 1 times as many bytes as the
UTF-8 string. For U+0000-U+007F, the UTF-8 and Latin 1 characters will
both be one byte; for U+0080-U+00FF, the UTF-8 string will be two bytes to
the Latin 1 string's one byte. It would be wise to allocate a buffer just
as long as the UTF-8 string, since any language that uses Latin 1 tends to
use primarily the characters in the ASCII range.
If you convert Latin1 to UTF8 the text might need up to twice the space.
That's true, though only if the string contains only accented characters
and less-common punctuation (which is difficult for a meaningful string of
any size in any European language).
~Chris
- --
Christopher R. Maden, Principal Consultant, crism consulting
DTDs/schemas - conversion - ebooks - publishing - Web - B2B - training
<URL: http://crism.maden.org/consulting/ >
PGP Fingerprint: BBA6 4085 DED0 E176 D6D4 5DFC AC52 F825 AFEC 58DA
-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Privacy 6.5.8
iQA/AwUBPP3HFqxS+CWv7FjaEQJhOwCfeFJWPk2HEJGRuGLCkgdeRNCwA2sAn2wV
3auodVRhGSo807dfFn7SkSOc
=qK7l
-----END PGP SIGNATURE-----
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]