Re: How to keep UTF-8 characters, but escape non-UTF-8 byte sequence to hex codes in ASCII

From: tomas tuxteam de
To: Daniel Yek <dyek real com>
Cc: gtk-app-devel-list gnome org
Subject: Re: How to keep UTF-8 characters, but escape non-UTF-8 byte sequence to hex codes in ASCII
Date: Thu, 30 Nov 2006 11:54:51 +0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Nov 29, 2006 at 05:25:25PM -0800, Daniel Yek wrote:

Hi,

I am attempting to handle raw filenames (which may be encoded differently 
than the character set used by the filesystem) gracefully.

[...]

with a raw character outside of UTF-8 character set):

Character:  P  r  e  s  e  n  t  a  c  i  Ã  n  Ã     .  s  x  i
Hex code:   50 72 65 73 65 6e 74 61 63 69 f3 6e c3 b3 2e 73 78 69

To be converted to this:
Character:  P  r  e  s  e  n  t  a  c  i  %  f  3  n  Ã     .  s  x  i
Hex code:   50 72 65 73 65 6e 74 61 63 69 25 66 33 6e c3 b3 2e 73 78 69


And how is the converter supposed to guess that this "raw character"
(here 0xf3 and perhaps lots of following bytes) has to be interpreted as
an iso-8859-1 (or iso-8859-2) encoded thing (what you seem to imply
here)? This could be as well an "Ñ" or an "Ï" (to cite some unibyte
encodings. Going multibyte might be even more fun).

That means you'll have to handle those decisions yourself. Maybe the
libc routines iconv_open()/iconv()/iconv_close() help you with that
(they try to convert up to an illegal sequence, stop there and tell you).

HTH
- -- tomÃs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFFbsaLBcgs9XrR2kYRAiFzAJwNeWk05WeekRO/xpy5SVizz0bRaACfZDYD
iL+hcmhMt4McadFzU4R3oSI=
=bSJY
-----END PGP SIGNATURE-----

Follow-Ups:
- Re: How to keep UTF-8 characters, but escape non-UTF-8 byte sequence to hex codes in ASCII
  - From: David =?iso-8859-2?B?TmXoYXMgKFlldGkp?=
- Re: How to keep UTF-8 characters, but escape non-UTF-8 byte sequence to hex codes in ASCII
  - From: Daniel Yek

References:
- How to keep UTF-8 characters, but escape non-UTF-8 byte sequence to hex codes in ASCII
  - From: Daniel Yek

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]