Re: [Vala] how can I get the number of unicode points in a string?
- From: 琉璃井 <pharaoh456 163 com>
- To: vala-list <vala-list gnome org>
- Subject: Re: [Vala] how can I get the number of unicode points in a string?
- Date: Mon, 4 Apr 2011 11:09:30 +0800 (CST)
发件人: 琉璃井 <pharaoh456 163 com>
发送时间: 2011-04-04 11:00
主 题: Re: [Vala] how can I get the number of unicode points in a string?
收件人:vala-list gnome org
于 2011/4/3 21:30, Adam Dingle 写道:
On 04/03/2011 06:08 AM, 琉璃井 wrote:
From: "琉璃井"<pharaoh456 163 com>
Date: 2011-04-03 18:15:12
To: "Luca Bruno"<lethalman88 gmail com>
Subject: Re:Re: [Vala] how can I get the number of unicode points in
a string?
At 2011-04-03 16:06:32,"Luca Bruno"<lethalman88 gmail com> wrote:
On Sun, Apr 03, 2011 at 03:59:23PM +0800, 琉璃井 wrote:
I see that since 0.11.0 vala string.length returns number of bytes
rather than that of unicode characters, and string[i] returns only
one byte. I wonder how to deal with east Asian character strings.
There are other methods in string that deal with utf8. For example
char_count() and next_char().
thank you.
I find char_count(), get_char() and next_char() in gtk+ document.
Looks like these methods are not covered in vala tutorial and document.
Is there something like string[i] for index access to utf8? I didn't
get it in docs.
To get the i-th character, you could do this:
str.get_char(str.index_of_nth_char(i));
But the current string methods are designed for iteration by offsets,
not characters. So you should *not* do this, which will be inefficient:
for (int i = 0 ; i< str.char_count() ; ++i) // don't do this
str.get_char(str.index_of_nth_char(i));
Instead, you want to iterate over the string using get_char() and
next_char(). This is slightly inconvenient since these functions use
pointers rather than integer offsets. In Vala trunk, Jürg has just
committed a new method string.get_next_char() which will make it
easier to iterate over strings:
// in class string
public bool get_next_char (ref int index, out unichar c);
That isn't in any Vala release yet, though. (In the meantime, you
might be able to copy and paste his implementation from glib-2.0.vapi
in Vala trunk.)
adam
I know get_char and next_char are used for reducing iteration overhead,
but there may be other convenient way to access a utf8 string with
efficency. After all, getting a byte from a string using offset is not
so resonable because people seldom needs to get a byte in a whole
character.
Is it possible to design the string like this:
class string
{
private unichar* buffer;
private int* offset_array;
... ...
public unichar operator [](const int i)
{
int offset=offset_array[i];
return buffer[offset];
}
}
offset_array stores the offset of utf8 charater by index. It is
initialized in constructor or something.
Then we can use string[index] with no iteration overhead.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]