Unicode question...
- From: Derek Simkowiak <dereks kd-dev com>
- To: gtk-devel-list gnome org
- Subject: Unicode question...
- Date: Thu, 6 Jul 2000 13:20:03 -0700 (PDT)
	This is really more of a Unicode question than a Gtk question, but
I want to understand the answer in the context of Owen's new gunicode.h,
so here goes:
	How do C's escape characters relate to Unicode?  I.e., the
string
"Hello World.\n"
	Has 13 ASCII characters, the last one of which is \n.  What does
that look like as a wide character?  What does \t look like?  Does it
matter?
	Basically, I need to split UTF-8 string input on the carriage
return.  So would I do somthing like this:
  while ( utf8_input_string != NULL )
    {
      if ( *utf8_input_string == '\n' )
          total_lines_detected++;
      utf8_input_string = g_utf8_next_char( utf8_input_string );
    }
	Or would I need to do this:
  gint char_count;
  gunichar *ucs4_input_string;
  gunichar wide_newline;
  
  char_count = g_utf8_strlen(utf8_input_string);
  ucs4_input_string = g_utf8_to_ucs4(utf8_input_string, char_count);
  wide_newline = g_utf8_to_ucs4("\n", 1);
  while ( ucs4_input_string != NULL )
    {
      if ( *ucs4_input_string == wide_newline )
          total_lines_detected++;
      ucs4_input_string++;
    }
	I'm assuming that C converts '\n' into an 8-byte ASCII value, so
things like 
      if ( *ucs4_input_string == '\n' )
          total_lines_detected++;
	will not work.  Or is there some kind of hidden typecasting that
will let the one-byte \n compare directly to a 4-byte ucs4 character?
	Any help is greatly appreciated...
Thanks,
Derek Simkowiak
dereks@kd-dev.com
P.S.> It would be helpful if, in gunicode.h, every instance of "gint len"
were replaced with one of these:
gint char_count    [...or...]
gint byte_count
[
Date Prev][
Date Next]   [
Thread Prev][
Thread Next]   
[
Thread Index]
[
Date Index]
[
Author Index]