Re: How to keep UTF-8 characters, but escape non-UTF-8 byte sequence to hex codes in ASCII



At 01:57 AM 12/1/2006, Peter Lund wrote:
On Wed, 2006-11-29 at 17:25 -0800, Daniel Yek wrote:

To be clear, I want UTF-8 characters remain UTF-8 and only escape non-UTF-8
byte sequence. Is there a function that does that?

I don't think so.  I didn't actually have a look but I don't recall seing one.

Luckily, it is easy to write one yourself.

g_utf8_get_char_validate() will tell you whether a byte sequence is a valid UTF-8 char or not.


Sketch code, not compile tested:

Thanks so much for the sample code. That is yet another way to accomplish it.

In my second email, I indicated that I implemented it using g_utf8_validate() very trivially. Similarly, I used GString.

Thanks.

--
Daniel Yek

GString *convert_fname(GString *s)
{
        int      idx; /* byte index into s        */
        GString *tmp; /* collect converted string */

        idx = 0;

        tmp = g_string_new("");

        while (idx < s->len) {
                gunichar ch;

                ch = g_utf8_get_char_validate(s->str + idx, s->len - idx);
                if (ch >= 0) {
                        g_string_append_unichar(tmp, ch);
                        idx += g_unichar_to_utf8(ch, NULL);
                } else {
                        g_string_append_c(tmp, '%');
                        g_string_append_printf(tmp, "%02X", );
                        idx++;
                }
        }

        return tmp;
}


-Peter






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]