On Tue, Mar 23, 2004 at 08:57:21PM +0100, Murray Cumming wrote: > On Tue, 2004-03-23 at 18:49, Paul Elliott wrote: > > On Tue, Mar 23, 2004 at 10:18:06AM +0100, Murray Cumming wrote: > > > On Tue, 2004-03-23 at 03:37, Paul Elliott wrote: > > > > I am told that there are some systems where sizeof(wchar_t) == 2. > > > > On these systems the c locale can only support UCS-2 not UCS-4! > > > > > > > > Has Gtkmm or Glibmm been ported to any of these systems? > > > > > > I have no idea. I think linux and GTK+ usually use UTF8 rather than UCS2 > > > or UCS4, and there is a working Windows port. > > > > > > > > > My understanding is that the buffers in Glib:ustring Gtk::TextBuffer > > contain UTF8, but the iterators interate over gunichar which are UCS-4. > > The TextBuffer iterators iterate over characters. Yes and those characters are UCS-4 even though the characters in the buffer are UTF8. > Please point me to > exactly the API reference (or a code example) for what you mean. > Iterating over bytes would be almost useless. > Proof: in ustirng.h ustring_Iterator is defined. When we look there we see that operator*() is type gunichar; > typedef gunichar value_type; > inline value_type operator*() const; > The implementation of operator*() is further down in the file: >template <class T> inline >typename ustring_Iterator<T>::value_type ustring_Iterator<T>::operator*() const >{ > return Glib::get_unichar_from_std_iterator(pos_); >} If we look at the documentation for get_unichar_from_std_iterator in: http://www.gtkmm.org/gtkmm2/docs/reference/html/namespaceGlib.html we find that it returns UCS-4 character: >gunichar get_unichar_from_std_iterator(std::string::const_iterator pos ) >Extract a UCS-4 character from UTF-8 data. > > >Convert a single UTF-8 (multibyte) character starting at pos to a >UCS-4 wide character. This may read up to 6 bytes after the start >position, depending on the UTF-8 character width. You have to make >sure the source contains at least one valid UTF-8 character. > > >This is mainly used by the implementation of Glib::ustring::iterator, >but it might be useful as utility function if you prefer using >std::string even for UTF-8 encoding. Thus even the buffer contains varrible byte UTF8 "characters", the interator returns fixed size 32 bit UCS-4 characters. My point is that these are too big to fit in a wchar_t for those systems where sizeof(wchar_t) ==2. My original question is: "Does gtkmm, glibmm run on any systems where sizeof(wchar_t) == 2?" (I want my code to use another library (boost::regex) that needs these characters to be in wchar_t.) -- Paul Elliott 1(512)837-1096 pelliott io com PMB 181, 11900 Metric Blvd Suite J http://www.io.com/~pelliott/pme/ Austin TX 78758-3117
Attachment:
pgpCHGN26KeuL.pgp
Description: PGP signature