Re: Unicode and C++

From: Derek Simkowiak <dereks kd-dev com>
To: Owen Taylor <otaylor redhat com>
Cc: gtk-i18n-list redhat com, libstdc++ sourceware cygnus com,gtk-i18n-list gnome org, Nathan Myers <ncm cantrip org>
Subject: Re: Unicode and C++
Date: Wed, 5 Jul 2000 10:02:02 -0700 (PDT)

->    while (*p)
->      {
->        if (g_utf8_get_char (p) == wc)
->          n++;
->        p = g_utf8_next_char (p);
->      }

	Where are these g_utf8_*() functions defined, and will they be a
standard part of Gtk+ 2.0?  (And is Gtk2 the official new name for Gtk+
2.x)?

-> It's conceivably possible that in the few cases where conversion
-> could _possibly_ be an overhead, like the Text widget, we might
-> want to have dual interfaces.

	The text widget I'm working on uses a gapped text buffer to store
the text internally.  The gapped buffer can hold 8-bit chars (ASCII/ANSI),
16-bit chars (UCS-2), and 32-bit chars (UCS-4).  The API will let the
programmer choose how to store the characters internally in the buffer,
so that Pango will be able to render non-western fonts.

	However, the public API for inserting text will only take UTF-8
encoded strings.  One interface for text insertion.  Deletions are based
on the number of *characters*, not the number of bytes, so using UTF-8 you
might insert 7 bytes but delete only 2 characters to "undelete" the
insertion.

-> this point I could see some point in making Pango use UCS-4
-> internally, and providing dual entry points. (I'd like to see Pango
-> used outside of GTK+, and providing UCS-4 interfaces along with the
-> UTF-8 ones interfaces might help in this. Not that converting between
-> UCS-4 and UTF-16 with surrogates is significantly nicer than
-> converting between UTF-8 and UTF-16 with surrogates. In either case,
-> converting an index is a O(n) operation.)

	Having a UCS-4 (and UCS-2!) interface would make my text widget
render faster, because I wouldn't have to from UCS-4 to UTF-8, only so
Pango can internally convert it back again.  Since Pango's job is
rendering Unicode (at least, that's how I see it) having UCS-2/4
interfaces would seem like a good idea to me.

-> I very much doubt I'll have a chance to do this for Pango-1.0. 

	I do have a question about UCS-4 encoding: I thought that the
32-bit encoding was only used with special-purpose, "non-standard" fonts
and characters.  I also thought that UCS-2 could encode any (registered)
natural script.  (Otherwise, how could Java get away with a 16-bit char?)

	So, what benefit is there to giving Pango a 32-bit interface?
Wouldn't any characters not in UCS-2 need custom fonts and/or glyphs for
rendering, thus making Pango unusable as the layout/rendering engine?


Thanks,
Derek Simkowiak
dereks@kd-dev.com

Follow-Ups:
- Re: Unicode and C++
  - From: Owen Taylor

References:
- Re: Unicode and C++
  - From: Owen Taylor

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]