Re: [Vala] how can I get the number of unicode points in a string?



The idea behind the API isn't to fetch the byte at an offset - instead, you'll typically fetch the *character* at an offset, and then advance the offset by the number of bytes in the character. In other words, you could iterate over the characters in a string using the new get_next_char() method like this:

int i = 0;
unichar c;
while (get_next_char(ref i, out c))
  handle_character(c);

On each loop iteration, the offset (i) will increment by the size of the character c.

    Is it possible to design the string like this:
    class string
    {
    private unichar* buffer;
    private int* offset_array;
    ... ...
    public unichar operator [](const int i)
    {
    int offset=offset_array[i];
    return buffer[offset];
    }
    }
    offset_array stores the offset of utf8 charater by index. It is
    initialized in constructor or something.
    Then we can use string[index] with no iteration overhead.


That would add lots of overhead (both in time and space) for every string, and would have limited benefit. Iterating over characters in a string is a common operation, and is both easy and efficient with the current API. Fetching the n-th character in a string is less often necessary, so it's OK for it to be less efficient. In the rare case where you really do need random access to characters by index, you could always iterate over all characters in a string and store them in a unichar[] array for that purpose, or you could construct a data structure similar to the one you've outlined above.

adam
That's good. I just installed 0.12.0. I'll test some code and add this new method to the tutorial page. Now string[i] returns a byte by offset i but string.get_char(i) returns a character by offset i, if i is in the right position. I really wish that the convenient [] operator be used in place of get_char().

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]