Re: Glib::ustring's operator<< doing a conversion to locale, why?
- From: "Milosz Derezynski" <internalerror gmail com>
- To: "Daniel Elstner" <daniel kitta googlemail com>
- Cc: Chris Vine <chris cvine freeserve co uk>, Murray Cumming <murrayc murrayc com>, Gtkmm Mailing List <gtkmm-list gnome org>
- Subject: Re: Glib::ustring's operator<< doing a conversion to locale, why?
- Date: Tue, 8 May 2007 11:10:33 +0200
Hey _again_ (yes again, sorry),
We're working on an ustring derived class currently (in testing) to work around this issue (we know it's intended as a final class, yet we see no reasonable way to fix this, except to have Post-Its with "use ::raw()!" everywhere taped to our screens), and just came up with an idea.
Basically, an additional member function of ustring, ::set_output_type() or ::set_flags(), that would modify a static member variable (so this would pertain to all ustrings used), and with which it would be possible to toggle the behaviour into a different one (in our plea the named one, to just use os << raw()). Overhead incurred would be 1 static member variable, and 1 use of switch() in operator<<().
Would this be an acceptable code change?
On 5/5/07, Daniel Elstner <daniel kitta googlemail com
Am Samstag, den 05.05.2007, 17:04 +0200 schrieb Milosz Derezynski:
> A small follow up to the previous mail wrt to "i see no reason": An
> std::string can hold text in a different code than the current locale
> as well. ustring makes the assumption that since it always holds
> UTF-8, and i think can not even sanely hold anything else (save for
> valid subsets of UTF-8), it is fully rational to perform this
> conversion in the operators, but the part that bugs me is that it uses
> LANG or LC_* or LOCALE (etc, as stated) as basis for the conversion,
> for which there is no reason to believe that people actually always
> want this.
> If it would use the current global C++ locale (if there can be a
> global locale setting, sorry for my newbishness again), it would be
> all right really, but this way, it's just beyond odd.
> Daniel can you maybe shed some light on this please?
Yes, I agree that it was basically a mistake to make operator<<()
convert to the locale encoding. I implemented this at a time when GCC's
libstdc++ didn't support the C++ locale scheme and the global C locale
was always used. Now I find myself writing .raw() all the time.
As you write above, doing the conversion is not entirely unreasonable
though, since ustring always uses UTF-8 and C++ streams may use
different encodings. The problem is, though, that the intended
facilities for stream codeset conversion -- that is, codecvt -- are next
to useless. There's scarce documentation on the subject but from what I
gathered there's no public interface to get the name of the encoding
used by a stream.
It is quite obvious that the C++ standard library API simply wasn't
designed for using multi-byte encodings internally. The code conversion
facilities of streams seem to exist mainly for conversion between wide
characters (internal) and multi-byte (external) when using wide streams.
By the way, my patch adding the compose() and format() features to
glibmm also introduces operator<<() and operator>>() conversions to
std::wostream and from std::wistream, respectively:
These conversions are actually sensible to do, and even independent of
the locale on many (most?) systems -- at least on modern glibc systems
(always UCS-4) and Windows (always UTF-16).
> > I think I remember someone (Daniel Elstner?) mentioning that
> they seemed
> > like a good idea at the time but turned out to be a mistake
> which it is
> > too late to change.
Indeed. I think I said this in some bugzilla comment.
] [Thread Prev