Re: Unicode and C++
- From: Havoc Pennington <hp redhat com>
- To: gtk-i18n-list gnome org
- Cc: libstdc++ sourceware cygnus com, otaylor redhat com
- Subject: Re: Unicode and C++
- Date: 03 Jul 2000 11:59:24 -0400
Nathan Myers <ncm@cantrip.org> writes:
> Manipulating UTF-8 in memory is pathetic. UTF-8 is compact and
> convenient as a network and file format representation, but it sucks
> rocks for string manipulations, or in general for in-memory operations.
> Things that are naturally O(1) become O(n) for no reason better than
> sheer obstinacy and stubbornness.
>
Or in the GTK+ case, massive quantities of legacy code that has to
keep working. UTF8 is pretty easy to port to; UCS4 requires
duplicating the whole API, then porting all apps to it. Without the
nice C++ trick you've outlined here, it's also quite inefficient to
use UCS4 internally but UTF8 in the interfaces.
> Ideally, we would plan to add wide-character interfaces to the
> GTK/GNOME components. A new-generation component system does nobody
> any favors by forcing them to stick with using 8-bit chars to hold
> things that are intrinsically bigger.
Sadly (well, partially sadly), GTK+ isn't new generation, it already
supports millions of lines of code.
My Inti C++ wrapper is new generation however, so I can use your suggestion.
> For cases where you want an efficient addressable container object
> (e.g. for operator[]()), you can make an object that keeps both
> representations. Flags indicate that the char[] or wchar_t[] form
> has been invalidated, and must be (lazily) regenerated after mutative
> operations on the other form. Then conversions happen invisibly and
> only as necessary.
>
Excellent, this is the perfect solution.
> The following is just a sketch.
>
> class Unicode_string
> {
> // constructors
> explicit Unicode_string(char const* p)
> : narrow(p), wide(), flags(narrow_ok) {}
> explicit Unicode_string(std::string const& s)
> : narrow(s), wide(), flags(narrow_ok) {}
> explicit Unicode_string(std::wstring const& s)
> : narrow(), wide(s), flags(wide_ok) {}
>
If this string goes in libstdc++ as an extension, could it share the
refcounted guts of std::string and std::wstring to avoid copies for
these constructors (and for the conversion operators)?
(I don't even know if you are using refcounting in the latest lib, but
thought I'd ask.)
Havoc
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]