Unicode question...

From: Derek Simkowiak <dereks kd-dev com>
To: gtk-devel-list gnome org
Subject: Unicode question...
Date: Thu, 6 Jul 2000 13:20:03 -0700 (PDT)

	This is really more of a Unicode question than a Gtk question, but
I want to understand the answer in the context of Owen's new gunicode.h,
so here goes:


	How do C's escape characters relate to Unicode?  I.e., the
string

"Hello World.\n"

	Has 13 ASCII characters, the last one of which is \n.  What does
that look like as a wide character?  What does \t look like?  Does it
matter?


	Basically, I need to split UTF-8 string input on the carriage
return.  So would I do somthing like this:

  while ( utf8_input_string != NULL )
    {
      if ( *utf8_input_string == '\n' )
          total_lines_detected++;

      utf8_input_string = g_utf8_next_char( utf8_input_string );
    }


	Or would I need to do this:

  gint char_count;
  gunichar *ucs4_input_string;
  gunichar wide_newline;
  
  char_count = g_utf8_strlen(utf8_input_string);
  ucs4_input_string = g_utf8_to_ucs4(utf8_input_string, char_count);

  wide_newline = g_utf8_to_ucs4("\n", 1);

  while ( ucs4_input_string != NULL )
    {
      if ( *ucs4_input_string == wide_newline )
          total_lines_detected++;

      ucs4_input_string++;
    }


	I'm assuming that C converts '\n' into an 8-byte ASCII value, so
things like 

      if ( *ucs4_input_string == '\n' )
          total_lines_detected++;

	will not work.  Or is there some kind of hidden typecasting that
will let the one-byte \n compare directly to a 4-byte ucs4 character?

	Any help is greatly appreciated...

Thanks,
Derek Simkowiak
dereks@kd-dev.com

P.S.> It would be helpful if, in gunicode.h, every instance of "gint len"
were replaced with one of these:

gint char_count    [...or...]
gint byte_count

Follow-Ups:
- Re: Unicode question...
  - From: Michael Meeks
- Re: Unicode question...
  - From: Robert Brady

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]