Re: How to keep UTF-8 characters, but escape non-UTF-8 byte sequence to hex codes in ASCII
- From: Peter Lund <firefly vax64 dk>
- To: Daniel Yek <dyek real com>
- Cc: gtk-app-devel-list gnome org
- Subject: Re: How to keep UTF-8 characters, but escape non-UTF-8 byte sequence to hex codes in ASCII
- Date: Fri, 01 Dec 2006 10:57:49 +0100
On Wed, 2006-11-29 at 17:25 -0800, Daniel Yek wrote:
To be clear, I want UTF-8 characters remain UTF-8 and only escape non-UTF-8
byte sequence. Is there a function that does that?
I don't think so. I didn't actually have a look but I don't recall
seing one.
Luckily, it is easy to write one yourself.
g_utf8_get_char_validate() will tell you whether a byte sequence is a
valid UTF-8 char or not.
Sketch code, not compile tested:
GString *convert_fname(GString *s)
{
int idx; /* byte index into s */
GString *tmp; /* collect converted string */
idx = 0;
tmp = g_string_new("");
while (idx < s->len) {
gunichar ch;
ch = g_utf8_get_char_validate(s->str + idx, s->len - idx);
if (ch >= 0) {
g_string_append_unichar(tmp, ch);
idx += g_unichar_to_utf8(ch, NULL);
} else {
g_string_append_c(tmp, '%');
g_string_append_printf(tmp, "%02X", );
idx++;
}
}
return tmp;
}
-Peter
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]