Re: [gtkmm] Does Gtkmm, Glibmm work on systems where sizeof(wchar_t) < sizeof(gunichar)?



On Tue, Mar 23, 2004 at 08:57:21PM +0100, Murray Cumming wrote:
> On Tue, 2004-03-23 at 18:49, Paul Elliott wrote:
> > On Tue, Mar 23, 2004 at 10:18:06AM +0100, Murray Cumming wrote:
> > > On Tue, 2004-03-23 at 03:37, Paul Elliott wrote:
> > > > I am told that there are some systems where sizeof(wchar_t) == 2.
> > > > On these systems the c locale can only support UCS-2 not UCS-4!
> > > > 
> > > > Has Gtkmm or Glibmm been ported to any of these systems?
> > > 
> > > I have no idea. I think linux and GTK+ usually use UTF8 rather than UCS2
> > > or UCS4, and there is a working Windows port.
> > > 
> > 
> > 
> > My understanding is that the buffers in Glib:ustring Gtk::TextBuffer
> > contain UTF8, but the iterators interate over gunichar which are UCS-4.
> 
> The TextBuffer iterators iterate over characters. 

Yes and those characters are UCS-4 even though the characters
in the buffer are UTF8.

>                                                    Please point me to
> exactly the API reference (or a code example) for what you mean.
> Iterating over bytes would be almost useless.
> 

Proof:

in ustirng.h ustring_Iterator is defined. When we look there we see
that operator*() is type gunichar;

>  typedef gunichar                          value_type;
>  inline value_type operator*() const;
>
The implementation of operator*() is further down in the file:


>template <class T> inline
>typename ustring_Iterator<T>::value_type ustring_Iterator<T>::operator*() const
>{
>  return Glib::get_unichar_from_std_iterator(pos_);
>}

If we look at the documentation for get_unichar_from_std_iterator in:
http://www.gtkmm.org/gtkmm2/docs/reference/html/namespaceGlib.html
we find that it returns UCS-4 character:

>gunichar get_unichar_from_std_iterator(std::string::const_iterator pos )
>Extract a UCS-4 character from UTF-8 data. 
>
>
>Convert a single UTF-8 (multibyte) character starting at pos to a
>UCS-4 wide character. This may read up to 6 bytes after the start
>position, depending on the UTF-8 character width. You have to make
>sure the source contains at least one valid UTF-8 character.
>
>
>This is mainly used by the implementation of Glib::ustring::iterator,
>but it might be useful as utility function if you prefer using
>std::string even for UTF-8 encoding.

Thus even the buffer contains varrible byte UTF8 "characters",
the interator returns fixed size 32 bit UCS-4 characters.

My point is that these are too big to fit in a wchar_t for
those systems where sizeof(wchar_t) ==2.

My original question is: "Does gtkmm, glibmm run on any systems
where sizeof(wchar_t) == 2?" (I want my code to use another 
library  (boost::regex) that needs these characters to be in wchar_t.)

-- 
Paul Elliott                       1(512)837-1096
pelliott io com                    PMB 181, 11900 Metric Blvd Suite J
http://www.io.com/~pelliott/pme/   Austin TX 78758-3117

Attachment: pgpCHGN26KeuL.pgp
Description: PGP signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]