Re: [Vala] how can I get the number of unicode points in a string?
- From: Xu Ming <xm xming gmail com>
- To: Adam Dingle <adam yorba org>
- Cc: valalist <vala-list gnome org>
- Subject: Re: [Vala] how can I get the number of unicode points in a string?
- Date: Mon, 04 Apr 2011 19:10:47 +0800
The idea behind the API isn't to fetch the byte at an offset - instead,
you'll typically fetch the *character* at an offset, and then advance
the offset by the number of bytes in the character. In other words, you
could iterate over the characters in a string using the new
get_next_char() method like this:
int i = 0;
unichar c;
while (get_next_char(ref i, out c))
handle_character(c);
On each loop iteration, the offset (i) will increment by the size of
the character c.
Is it possible to design the string like this:
class string
{
private unichar* buffer;
private int* offset_array;
... ...
public unichar operator [](const int i)
{
int offset=offset_array[i];
return buffer[offset];
}
}
offset_array stores the offset of utf8 charater by index. It is
initialized in constructor or something.
Then we can use string[index] with no iteration overhead.
That would add lots of overhead (both in time and space) for every
string, and would have limited benefit. Iterating over characters in
a string is a common operation, and is both easy and efficient with
the current API. Fetching the n-th character in a string is less
often necessary, so it's OK for it to be less efficient. In the rare
case where you really do need random access to characters by index,
you could always iterate over all characters in a string and store
them in a unichar[] array for that purpose, or you could construct a
data structure similar to the one you've outlined above.
adam
That's good. I just installed 0.12.0. I'll test some code and add this
new method to the tutorial page.
Now string[i] returns a byte by offset i but string.get_char(i) returns
a character by offset i, if i is in the right position. I really wish
that the convenient [] operator be used in place of get_char().
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]