Unicode question...
- From: Derek Simkowiak <dereks kd-dev com>
- To: gtk-devel-list gnome org
- Subject: Unicode question...
- Date: Thu, 6 Jul 2000 13:20:03 -0700 (PDT)
This is really more of a Unicode question than a Gtk question, but
I want to understand the answer in the context of Owen's new gunicode.h,
so here goes:
How do C's escape characters relate to Unicode? I.e., the
string
"Hello World.\n"
Has 13 ASCII characters, the last one of which is \n. What does
that look like as a wide character? What does \t look like? Does it
matter?
Basically, I need to split UTF-8 string input on the carriage
return. So would I do somthing like this:
while ( utf8_input_string != NULL )
{
if ( *utf8_input_string == '\n' )
total_lines_detected++;
utf8_input_string = g_utf8_next_char( utf8_input_string );
}
Or would I need to do this:
gint char_count;
gunichar *ucs4_input_string;
gunichar wide_newline;
char_count = g_utf8_strlen(utf8_input_string);
ucs4_input_string = g_utf8_to_ucs4(utf8_input_string, char_count);
wide_newline = g_utf8_to_ucs4("\n", 1);
while ( ucs4_input_string != NULL )
{
if ( *ucs4_input_string == wide_newline )
total_lines_detected++;
ucs4_input_string++;
}
I'm assuming that C converts '\n' into an 8-byte ASCII value, so
things like
if ( *ucs4_input_string == '\n' )
total_lines_detected++;
will not work. Or is there some kind of hidden typecasting that
will let the one-byte \n compare directly to a 4-byte ucs4 character?
Any help is greatly appreciated...
Thanks,
Derek Simkowiak
dereks@kd-dev.com
P.S.> It would be helpful if, in gunicode.h, every instance of "gint len"
were replaced with one of these:
gint char_count [...or...]
gint byte_count
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]