Re: How to keep UTF-8 characters, but escape non-UTF-8 byte sequence to hex codes in ASCII



On Wed, 2006-11-29 at 17:25 -0800, Daniel Yek wrote:


To be clear, I want UTF-8 characters remain UTF-8 and only escape non-UTF-8 
byte sequence. Is there a function that does that?


I don't think so.  I didn't actually have a look but I don't recall
seing one.

Luckily, it is easy to write one yourself.

g_utf8_get_char_validate() will tell you whether a byte sequence is a
valid UTF-8 char or not.


Sketch code, not compile tested:


GString *convert_fname(GString *s)
{
        int      idx; /* byte index into s        */
        GString *tmp; /* collect converted string */

        idx = 0;

        tmp = g_string_new("");

        while (idx < s->len) {
                gunichar ch;

                ch = g_utf8_get_char_validate(s->str + idx, s->len - idx);
                if (ch >= 0) {
                        g_string_append_unichar(tmp, ch);
                        idx += g_unichar_to_utf8(ch, NULL);
                } else {
                        g_string_append_c(tmp, '%');
                        g_string_append_printf(tmp, "%02X", );
                        idx++;
                }
        }

        return tmp;
}



-Peter



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]