Re: unicode string
- From: Nathan Myers <ncm nospam cantrip org>
- To: Havoc Pennington <hp redhat com>
- Cc: gtk-i18n-list gnome org, libstdc++ sourceware cygnus com
- Subject: Re: unicode string
- Date: Thu, 6 Jul 2000 01:10:36 -0700
On Thu, Jul 06, 2000 at 03:25:43AM -0400, Havoc Pennington wrote:
>
> I'll try putting together an implementation of utf8_string, and send
> it along to see what you think. I'd appreciate any comments.
>
> One question about the sketch you sent: your conversion from const
> char* is explicit, unlike std::string. Is there a rationale for that?
I don't think you would ever want this thing to be constructed
invisibly. The point is debatable for regular strings, but for
this one it seems pretty clear. Invisible conversions are a
real hazard in the best of circumstances, and rarely justified.
The operator conversions are sort-of-OK here because there really
is (e.g.) a wstring in there, but even then there's the lifetime
issue. References into other objects are always perilous.
I'm still debating whether the operator conversions really should
just be ordinary named member functions. If this is a temporary
conversion hack, maybe not. But if it's a permanent part of the
library, then probably so. (Is anything ever really as temporary
as it should be?)
Note that in any case no function should take one of these monsters
as an argument.
Nathan Myers
ncm at cantrip dot org
---------
class utf8_string
{
// constructors
explicit utf8_string(char const* p)
: m_narrow(p), m_wide(), m_flags(utf8_string::narrow_ok) {}
explicit utf8_string(std::string const& s)
: m_narrow(s), m_wide(), m_flags(utf8_string::narrow_ok) {}
explicit utf8_string(std::wstring const& s)
: m_narrow(), m_wide(s), m_flags(utf8_string::wide_ok) {}
// conversions
operator std::wstring const&() const
{ this->m_widen(); return this->m_wide; }
operator std::string const&() const
{ this->m_narrowen(); return this->m_narrow; }
char const* c_str() const
{ this->m_narrowen(); return this->m_narrow.c_str(); }
// utility operations
wchar_t& operator[](size_t i);
wchar_t const& operator[](size_t i) const
{ this->widen(); return this->wide[i]; }
bool equal(utf8_string const& s) const;
bool less(utf8_string const& s) const;
private:
void widen() const
{ if (!(this->m_flags & wide_ok)) this->m_make_wide(); }
void narrowen() const
{ if (!(this->m_flags & narrow_ok)) this->m_make_narrow(); }
void make_wide() const; // do UTF-8 to UCS4 conversion
void make_narrow() const; // do UCS4 to UTF-8 conversion
// data members
mutable std::string m_narrow;
mutable std::wstring m_wide;
enum { neither_ok = 0, narrow_ok = 0x1, wide_ok = 0x2, both_ok = 0x3 };
mutable char m_flags;
};
inline bool
operator==(utf8_string const& a, Unicode_string const& b)
{ return a.equal(b); }
// utf8_string.cc:
bool
utf8_string::equal(Unicode_string const& s) const
{
switch (this->m_flags & s.m_flags)
{
case utf8_string::narrow_ok:
return this->m_narrow == s.m_narrow;
case utf8_string::none:
this->widen(); s.widen();
// fall through
case utf8_string::wide_ok:
case utf8_string::both_ok:
return this->m_wide == s.m_wide;
}
}
wchar_t&
utf8_string::operator[](size_t i) // non-const
{
this->widen();
this->m_flags &= ~utf8_string::narrow_ok;
return this->m_wide[i];
}
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]