Re: Unicode and C++
- From: Derek Simkowiak <dereks kd-dev com>
- To: Owen Taylor <otaylor redhat com>
- Cc: gtk-i18n-list redhat com, libstdc++ sourceware cygnus com,gtk-i18n-list gnome org, Nathan Myers <ncm cantrip org>
- Subject: Re: Unicode and C++
- Date: Wed, 5 Jul 2000 10:02:02 -0700 (PDT)
-> while (*p)
-> {
-> if (g_utf8_get_char (p) == wc)
-> n++;
-> p = g_utf8_next_char (p);
-> }
Where are these g_utf8_*() functions defined, and will they be a
standard part of Gtk+ 2.0? (And is Gtk2 the official new name for Gtk+
2.x)?
-> It's conceivably possible that in the few cases where conversion
-> could _possibly_ be an overhead, like the Text widget, we might
-> want to have dual interfaces.
The text widget I'm working on uses a gapped text buffer to store
the text internally. The gapped buffer can hold 8-bit chars (ASCII/ANSI),
16-bit chars (UCS-2), and 32-bit chars (UCS-4). The API will let the
programmer choose how to store the characters internally in the buffer,
so that Pango will be able to render non-western fonts.
However, the public API for inserting text will only take UTF-8
encoded strings. One interface for text insertion. Deletions are based
on the number of *characters*, not the number of bytes, so using UTF-8 you
might insert 7 bytes but delete only 2 characters to "undelete" the
insertion.
-> this point I could see some point in making Pango use UCS-4
-> internally, and providing dual entry points. (I'd like to see Pango
-> used outside of GTK+, and providing UCS-4 interfaces along with the
-> UTF-8 ones interfaces might help in this. Not that converting between
-> UCS-4 and UTF-16 with surrogates is significantly nicer than
-> converting between UTF-8 and UTF-16 with surrogates. In either case,
-> converting an index is a O(n) operation.)
Having a UCS-4 (and UCS-2!) interface would make my text widget
render faster, because I wouldn't have to from UCS-4 to UTF-8, only so
Pango can internally convert it back again. Since Pango's job is
rendering Unicode (at least, that's how I see it) having UCS-2/4
interfaces would seem like a good idea to me.
-> I very much doubt I'll have a chance to do this for Pango-1.0.
I do have a question about UCS-4 encoding: I thought that the
32-bit encoding was only used with special-purpose, "non-standard" fonts
and characters. I also thought that UCS-2 could encode any (registered)
natural script. (Otherwise, how could Java get away with a 16-bit char?)
So, what benefit is there to giving Pango a 32-bit interface?
Wouldn't any characters not in UCS-2 need custom fonts and/or glyphs for
rendering, thus making Pango unusable as the layout/rendering engine?
Thanks,
Derek Simkowiak
dereks@kd-dev.com
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]