[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
RE: [xml] UTF8Toisolat1() usage
- From: "Christopher R. Maden" <crism maden org>
- To: xml gnome org
- Subject: RE: [xml] UTF8Toisolat1() usage
- Date: Wed, 05 Jun 2002 01:08:54 -0700
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
At 01:06 5/6/02, Morus Walter wrote:
>A conversion from UTF8 to Latin1 may only shorten the text (down to
>the half of the utf8 byte length in extreme cases).
>So allocating a buffer of the size of the UTF8 text will be sufficiant.
No - the Latin 1 string may be between 0.5 and 1 times as many bytes as the
UTF-8 string. For U+0000-U+007F, the UTF-8 and Latin 1 characters will
both be one byte; for U+0080-U+00FF, the UTF-8 string will be two bytes to
the Latin 1 string's one byte. It would be wise to allocate a buffer just
as long as the UTF-8 string, since any language that uses Latin 1 tends to
use primarily the characters in the ASCII range.
>If you convert Latin1 to UTF8 the text might need up to twice the space.
That's true, though only if the string contains only accented characters
and less-common punctuation (which is difficult for a meaningful string of
any size in any European language).
~Chris
- --
Christopher R. Maden, Principal Consultant, crism consulting
DTDs/schemas - conversion - ebooks - publishing - Web - B2B - training
<URL: http://crism.maden.org/consulting/ >
PGP Fingerprint: BBA6 4085 DED0 E176 D6D4 5DFC AC52 F825 AFEC 58DA
-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Privacy 6.5.8
iQA/AwUBPP3HFqxS+CWv7FjaEQJhOwCfeFJWPk2HEJGRuGLCkgdeRNCwA2sAn2wV
3auodVRhGSo807dfFn7SkSOc
=qK7l
-----END PGP SIGNATURE-----
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]