Gtkmm-forge Digest, Vol 36, Issue 16

From: gtkmm-forge-request lists sourceforge net
To: gtkmm-forge lists sourceforge net
Subject: Gtkmm-forge Digest, Vol 36, Issue 16
Date: Fri, 29 May 2009 00:46:57 +0000
Send Gtkmm-forge mailing list submissions to
	gtkmm-forge lists sourceforge net

To subscribe or unsubscribe via the World Wide Web, visit
	https://lists.sourceforge.net/lists/listinfo/gtkmm-forge
or, via email, send a message with subject or body 'help' to
	gtkmm-forge-request lists sourceforge net

You can reach the person managing the list at
	gtkmm-forge-owner lists sourceforge net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Gtkmm-forge digest..."


gtkmm-forge is the mailing list that receives gtkmm bug reports from bugzilla.  A daily digest is sent to gtkmm-main, to encourage people to help fixing the bugs. Do not try to unsubscribe gtkmm-forge from gtkmm-list.


Today's Topics:

   1. [Bug 583992] Provide conversion between wide and	narrow
      unicode strings in glibmm (glibmm (bugzilla.gnome.org))
   2. [Bug 583992] Provide conversion between wide and	narrow
      unicode strings in glibmm (glibmm (bugzilla.gnome.org))
   3. [Bug 583992] Provide conversion between wide and	narrow
      unicode strings in glibmm (glibmm (bugzilla.gnome.org))
   4. [Bug 583992] Provide conversion between wide and	narrow
      unicode strings in glibmm (glibmm (bugzilla.gnome.org))
   5. [Bug 583992] Provide conversion between wide and	narrow
      unicode strings in glibmm (glibmm (bugzilla.gnome.org))
   6. [Bug 583992] Provide conversion between wide and	narrow
      unicode strings in glibmm (glibmm (bugzilla.gnome.org))


----------------------------------------------------------------------

Message: 1
Date: Wed, 27 May 2009 21:47:02 +0000 (UTC)
From: "glibmm (bugzilla.gnome.org)"
	<bugzilla-daemon bugzilla gnome org>
Subject: [gtkmm bugzilla] [Bug 583992] Provide conversion between wide
	and	narrow unicode strings in glibmm
To: gtkmm-forge lists sourceforge net
Message-ID: <20090527214702 8DC6C23EF7F label gnome org>
Content-Type: text/plain; charset=utf-8

If you have any questions why you received this email, please see the text at
the end of this email. Replies to this email are NOT read, please see the text
at the end of this email. You can add comments to this bug at:
  http://bugzilla.gnome.org/show_bug.cgi?id=583992

  glibmm | strings | Ver: 2.20.x




------- Comment #6 from Chris Vine  2009-05-27 21:47 UTC -------
gcc will warn about unreachable code on out of range integer comparisons (such
as comparing an unsigned integer with -1) but I have never seen it complain
about a comparison on a compile-time constant.  However, I suppose you could
assume that if config.h is absent that the size of wchar_t is 2, but as I say
why guess when everything you need is there with sizeof?  I suggest using
sizeof as a fall-back in cases where config.h is absent or SIZEOF_WHAR_T is
defined as 0 (on looking through the configure macros it seemed to me that that
was a possible outcome).  In the unlikely event of a compiler choosing to
complain, the build system can be changed to provide config.h.

I had seen what you had done in ustring.cc.  On reflection I agree with what
you say about catering for non-Unicode wide character sets, since it can be
done.  It's use in non-unicode cases however will be next to 0.  That would
mean putting it in convert.h/convert.cc, so the issue on unicode.h dependencies
ceases to be relevant.


-- 
See http://bugzilla.gnome.org/page.cgi?id=email.html for more info about why you received
this email, why you can't respond via email, how to stop receiving
emails (or reduce the number you receive), and how to contact someone
if you are having problems with the system.

You can add comments to this bug at http://bugzilla.gnome.org/show_bug.cgi?id=583992.



------------------------------

Message: 2
Date: Wed, 27 May 2009 21:58:54 +0000 (UTC)
From: "glibmm (bugzilla.gnome.org)"
	<bugzilla-daemon bugzilla gnome org>
Subject: [gtkmm bugzilla] [Bug 583992] Provide conversion between wide
	and	narrow unicode strings in glibmm
To: gtkmm-forge lists sourceforge net
Message-ID: <20090527215854 5263223F59D label gnome org>
Content-Type: text/plain; charset=utf-8

If you have any questions why you received this email, please see the text at
the end of this email. Replies to this email are NOT read, please see the text
at the end of this email. You can add comments to this bug at:
  http://bugzilla.gnome.org/show_bug.cgi?id=583992

  glibmm | strings | Ver: 2.20.x




------- Comment #7 from Chris Vine  2009-05-27 21:58 UTC -------
Actually, if g_iconv() is used as a fall-back, any test for the size of wchar_t
for conditional compilation can probably be made unnecessary.  You will end up
dealing with wchar_t, whatever it may be.  I suspect sizeof can just be used a
multiplier to calculate the number of bytes to convert.


-- 
See http://bugzilla.gnome.org/page.cgi?id=email.html for more info about why you received
this email, why you can't respond via email, how to stop receiving
emails (or reduce the number you receive), and how to contact someone
if you are having problems with the system.

You can add comments to this bug at http://bugzilla.gnome.org/show_bug.cgi?id=583992.



------------------------------

Message: 3
Date: Thu, 28 May 2009 00:33:41 +0000 (UTC)
From: "glibmm (bugzilla.gnome.org)"
	<bugzilla-daemon bugzilla gnome org>
Subject: [gtkmm bugzilla] [Bug 583992] Provide conversion between wide
	and	narrow unicode strings in glibmm
To: gtkmm-forge lists sourceforge net
Message-ID: <20090528003341 8F77323F59E label gnome org>
Content-Type: text/plain; charset=utf-8

If you have any questions why you received this email, please see the text at
the end of this email. Replies to this email are NOT read, please see the text
at the end of this email. You can add comments to this bug at:
  http://bugzilla.gnome.org/show_bug.cgi?id=583992

  glibmm | strings | Ver: 2.20.x




------- Comment #8 from Daniel Elstner  2009-05-28 00:33 UTC -------
(In reply to comment #6)
> I had seen what you had done in ustring.cc.  On reflection I agree with what
> you say about catering for non-Unicode wide character sets, since it can be
> done.  It's use in non-unicode cases however will be next to 0.

There is still the option of wchar_t having exactly the same representation as
char, which is allowed by the standard and could be reasonable e.g. on an
embedded system.

> That would mean putting it in convert.h/convert.cc, so the issue on unicode.h
> dependencies ceases to be relevant.

Yes, I was going to suggest that.  Another option would be to add a constructor
to Glib::ustring for the conversion from std::wstring, and a wide() method
which returns an std::wstring.  I'm not sure though, what do you think?

(In reply to comment #7)
> Actually, if g_iconv() is used as a fall-back, any test for the size of
> wchar_t for conditional compilation can probably be made unnecessary.  You
> will end up dealing with wchar_t, whatever it may be.  I suspect sizeof can
> just be used a multiplier to calculate the number of bytes to convert.

Yes, that's true.  I remember now that I added the sizeof check only as an
additional safety measure, mainly for the WIN32 case. I was a bit worried that
some compilers might actually have an option to change the size of wchar_t. 
Well, the sizeof check isn't really airtight either.  I guess can live without
it.

Sigh.  Curse you, MSVC++!  The sizeof check of autoconf is really neat in that
it is able to get the value without actually executing any code.  Thus it works
even when cross-compiling for the mingw32 target.  Ah, well, I guess Armin will
appreciate the reduced maintenance burden for his MSVC builds...


-- 
See http://bugzilla.gnome.org/page.cgi?id=email.html for more info about why you received
this email, why you can't respond via email, how to stop receiving
emails (or reduce the number you receive), and how to contact someone
if you are having problems with the system.

You can add comments to this bug at http://bugzilla.gnome.org/show_bug.cgi?id=583992.



------------------------------

Message: 4
Date: Thu, 28 May 2009 09:05:21 +0000 (UTC)
From: "glibmm (bugzilla.gnome.org)"
	<bugzilla-daemon bugzilla gnome org>
Subject: [gtkmm bugzilla] [Bug 583992] Provide conversion between wide
	and	narrow unicode strings in glibmm
To: gtkmm-forge lists sourceforge net
Message-ID: <20090528090524 2FBA523F5A7 label gnome org>
Content-Type: text/plain; charset=utf-8

If you have any questions why you received this email, please see the text at
the end of this email. Replies to this email are NOT read, please see the text
at the end of this email. You can add comments to this bug at:
  http://bugzilla.gnome.org/show_bug.cgi?id=583992

  glibmm | strings | Ver: 2.20.x




------- Comment #9 from Chris Vine  2009-05-28 09:02 UTC -------
Gosh, systems with the same size of char and wchar_t but with different locale
encodings.  That's getting esoteric.

When I said I had seen your format function in ustring.cc I had seen your
stream formatting.  On rereading this I now think you were talking about
something different.  Is this something in git?  The new functions cannot be
called wide_to_narrow() and narrow_to_wide() any more, because they will not be
concerned only with converting between wide and narrow unicode representations.
 I suggest either your approach or new functions in convert.h called
wide_to_utf8() and wide_from_utf8().  I think I prefer the second approach
because I suspect it is where people will look and it is closer to the existing
narrow encoding conversion functions - they comprise locale_to/from_utf8()
rather than being implemented through a conversion constructor taking a
std::string argument and a locale() method returning a std::string object. 
However, to be honest I should go where your instincts take you.  Either seem
to me to be fine.


-- 
See http://bugzilla.gnome.org/page.cgi?id=email.html for more info about why you received
this email, why you can't respond via email, how to stop receiving
emails (or reduce the number you receive), and how to contact someone
if you are having problems with the system.

You can add comments to this bug at http://bugzilla.gnome.org/show_bug.cgi?id=583992.



------------------------------

Message: 5
Date: Thu, 28 May 2009 22:57:39 +0000 (UTC)
From: "glibmm (bugzilla.gnome.org)"
	<bugzilla-daemon bugzilla gnome org>
Subject: [gtkmm bugzilla] [Bug 583992] Provide conversion between wide
	and	narrow unicode strings in glibmm
To: gtkmm-forge lists sourceforge net
Message-ID: <20090528225739 D3A2423F59E label gnome org>
Content-Type: text/plain; charset=utf-8

If you have any questions why you received this email, please see the text at
the end of this email. Replies to this email are NOT read, please see the text
at the end of this email. You can add comments to this bug at:
  http://bugzilla.gnome.org/show_bug.cgi?id=583992

  glibmm | strings | Ver: 2.20.x




------- Comment #10 from Daniel Elstner  2009-05-28 22:57 UTC -------
(In reply to comment #9)
> Gosh, systems with the same size of char and wchar_t but with different locale
> encodings.  That's getting esoteric.

That would indeed be esoteric.  I haven't actually considered that possibility.
 I'd expect the standard to require that char is always a subset of wchar_t, so
that c == narrow(widen(c)).  At least I hope it does. :-)  That wouldn't rule
out ASCII char and ISO-8859-1 wchar_t, though.

But since we can have a catch-all fallback, all is fine.  What's important is
that the API works.  __STDC_ISO_10646__ and WIN32 are special-cased mainly to
improve performance, and because we *know* what a wchar_t is when either of
these is defined.  Well, and also because the iconv WCHAR_T alias is probably
not something one should rely on for a major target platform. ;-)

> When I said I had seen your format function in ustring.cc I had seen your
> stream formatting.  On rereading this I now think you were talking about
> something different.  Is this something in git?

No, it's indeed the wide stream formatting I was talking about:
http://git.gnome.org/cgit/glibmm/tree/glib/glibmm/ustring.cc#n1366

Hm, actually it's possible already to use an std::wstringstream to convert
between ustring and wstring by using these operators. ;-)  Of course that
doesn't mean that we shouldn't add a more convenient and straightforward
interface.  After all std::wstring is part of the STL and not all that exotic.

> The new functions cannot be
> called wide_to_narrow() and narrow_to_wide() any more, because they will not
> be concerned only with converting between wide and narrow unicode
> representations.

Actually, the term "narrow" wasn't appropriate for UTF-8 anyway.  The character
set doesn't change and is still wide.  A UTF-8 byte is not a character at all. 
Back then, when these terms and the corresponding API was standardized, no-one
seems to have thought of the possibility that using multi-byte encodings as
internal representation would some day become popular and even the norm.  They
haven't thought about i18n hard enough to realize that random access indexing
of code points is not all that useful in reality.

> I suggest either your approach or new functions in convert.h called
> wide_to_utf8() and wide_from_utf8().  I think I prefer the second approach
> because I suspect it is where people will look and it is closer to the
> existing narrow encoding conversion functions

Funny, I'd have thought it's the other way around and people look at the
ustring class first. :-)

Anyway, I think you are right that the functionality is closer to what
locale_to_utf8() and locale_from_utf8() do.  Most importantly, these
conversions can fail.  Well, actually they cannot fail if the input is valid
and wchar_t holds UCS-4 or UTF-16, but unfortunately we cannot assume that
Unicode is being used.


-- 
See http://bugzilla.gnome.org/page.cgi?id=email.html for more info about why you received
this email, why you can't respond via email, how to stop receiving
emails (or reduce the number you receive), and how to contact someone
if you are having problems with the system.

You can add comments to this bug at http://bugzilla.gnome.org/show_bug.cgi?id=583992.



------------------------------

Message: 6
Date: Fri, 29 May 2009 00:18:20 +0000 (UTC)
From: "glibmm (bugzilla.gnome.org)"
	<bugzilla-daemon bugzilla gnome org>
Subject: [gtkmm bugzilla] [Bug 583992] Provide conversion between wide
	and	narrow unicode strings in glibmm
To: gtkmm-forge lists sourceforge net
Message-ID: <20090529001820 0751323F59F label gnome org>
Content-Type: text/plain; charset=utf-8

If you have any questions why you received this email, please see the text at
the end of this email. Replies to this email are NOT read, please see the text
at the end of this email. You can add comments to this bug at:
  http://bugzilla.gnome.org/show_bug.cgi?id=583992

  glibmm | strings | Ver: 2.20.x




------- Comment #11 from Chris Vine  2009-05-29 00:18 UTC -------
My point about char and wchar_t being of the same size but containing bytes
with different character encodings was only that if they were the same
encodings and the same size then locale_to/from_utf8() would do the trick for
either (with a reinterpret_cast for wchar_t since in C++ wchar_t is not a
typedef).

When using "narrow" and "wide", I was not using them in the ctype facet sense,
which indeed assumes fixed size characters.  The codecvt facet for streams does
of course cater for multibyte encodings.  On your reading, "wide" is just as
unacceptable as "narrow", as Unicode is just a series of values between 0 and
0x10fff.  How they are encoded into a particular series of memory locations is
a separate matter.  Where the representation is in units of bytes (utf-8)
having the same size as char I think "narrow" is a reasonable description. 
Likewise where the representation is in units of wchar_t, I think "wide" is a
reasonable description.  The natural size of Unicode, if you want to view it
that way, is 21 bits, which on most (all?) platforms is the size of neither
char nor wchar_t.  But let's not argue about semantics.

Are you going to do this or were you expecting me to have a go at it?


-- 
See http://bugzilla.gnome.org/page.cgi?id=email.html for more info about why you received
this email, why you can't respond via email, how to stop receiving
emails (or reduce the number you receive), and how to contact someone
if you are having problems with the system.

You can add comments to this bug at http://bugzilla.gnome.org/show_bug.cgi?id=583992.



------------------------------

------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 

------------------------------

_______________________________________________
Gtkmm-forge mailing list
Gtkmm-forge lists sourceforge net
https://lists.sourceforge.net/lists/listinfo/gtkmm-forge


End of Gtkmm-forge Digest, Vol 36, Issue 16
*******************************************
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]