How to keep UTF-8 characters, but escape non-UTF-8 byte sequence to hex codes in ASCII
- From: Daniel Yek <dyek real com>
- To: gtk-app-devel-list gnome org
- Subject: How to keep UTF-8 characters, but escape non-UTF-8 byte sequence to hex codes in ASCII
- Date: Wed, 29 Nov 2006 17:25:25 -0800
Hi,
I am attempting to handle raw filenames (which may be encoded differently
than the character set used by the filesystem) gracefully.
I am looking for a function that is similar to:
g_filename_display_name(),
but instead of converting illegal byte sequence to Unicode replacement
character (0xef 0xbf 0xbd in UTF-8), I would like the illegal byte sequence
to be escaped to ASCII as in URI.
To be clear, I want UTF-8 characters remain UTF-8 and only escape non-UTF-8
byte sequence. Is there a function that does that?
That is, I would like this string (demonstrating a mostly UTF-8 filename,
with a raw character outside of UTF-8 character set):
Character: P r e s e n t a c i ó n ó . s x i
Hex code: 50 72 65 73 65 6e 74 61 63 69 f3 6e c3 b3 2e 73 78 69
To be converted to this:
Character: P r e s e n t a c i % f 3 n ó . s x i
Hex code: 50 72 65 73 65 6e 74 61 63 69 25 66 33 6e c3 b3 2e 73 78 69
I tried g_convert_with_fallback(str, -1, "UTF-8", "UTF-8" /* whatever
codeset used by filesystem */, NULL, NULL, NULL, NULL), but this function
won't accept non-UTF-8 input.
Thanks much for any hint.
--
Daniel Yek
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]