Re: How to keep UTF-8 characters, but escape non-UTF-8 byte sequence to hex codes in ASCII
- From: David Nečas (Yeti) <yeti physics muni cz>
- To: gtk-app-devel-list gnome org
- Subject: Re: How to keep UTF-8 characters, but escape non-UTF-8 byte sequence to hex codes in ASCII
- Date: Thu, 30 Nov 2006 13:17:41 +0100
On Thu, Nov 30, 2006 at 11:54:51AM +0000, tomas tuxteam de wrote:
On Wed, Nov 29, 2006 at 05:25:25PM -0800, Daniel Yek wrote:
I am attempting to handle raw filenames (which may be encoded differently
than the character set used by the filesystem) gracefully.
[...]
with a raw character outside of UTF-8 character set):
Character: P r e s e n t a c i ó n ó . s x i
Hex code: 50 72 65 73 65 6e 74 61 63 69 f3 6e c3 b3 2e 73 78 69
To be converted to this:
Character: P r e s e n t a c i % f 3 n ó . s x i
Hex code: 50 72 65 73 65 6e 74 61 63 69 25 66 33 6e c3 b3 2e 73 78 69
And how is the converter supposed to guess that this "raw character"
(here 0xf3 and perhaps lots of following bytes) has to be interpreted as
an iso-8859-1 (or iso-8859-2) encoded thing (what you seem to imply
here)? This could be as well an "??" or an "??" (to cite some unibyte
encodings...
I suppose the goal is to preserve information about the
bytes in a situation their interpretation (i.e. what
characters they represent) is already lost, and in that case
your question is void. Whether or not this can be actually
helpful I will not judge.
OP: I doubt there is any function doing this, but UTF-8
validation is very simple so you can write the function
easily yourself.
Yeti
--
Whatever.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]