Re: [Vala] how can I get the number of unicode points in a string?

From: Xu Ming <xm xming gmail com>
To: Adam Dingle <adam yorba org>
Cc: valalist <vala-list gnome org>
Subject: Re: [Vala] how can I get the number of unicode points in a string?
Date: Mon, 04 Apr 2011 19:10:47 +0800

The idea behind the API isn't to fetch the byte at an offset - instead,you'll typically fetch the *character* at an offset, and then advancethe offset by the number of bytes in the character. In other words, youcould iterate over the characters in a string using the newget_next_char() method like this:

int i = 0;
unichar c;
while (get_next_char(ref i, out c))
  handle_character(c);
On each loop iteration, the offset (i) will increment by the size ofthe character c.
    Is it possible to design the string like this:
    class string
    {
    private unichar* buffer;
    private int* offset_array;
    ... ...
    public unichar operator [](const int i)
    {
    int offset=offset_array[i];
    return buffer[offset];
    }
    }
    offset_array stores the offset of utf8 charater by index. It is
    initialized in constructor or something.
    Then we can use string[index] with no iteration overhead.
That would add lots of overhead (both in time and space) for everystring, and would have limited benefit. Iterating over characters ina string is a common operation, and is both easy and efficient withthe current API. Fetching the n-th character in a string is lessoften necessary, so it's OK for it to be less efficient. In the rarecase where you really do need random access to characters by index,you could always iterate over all characters in a string and storethem in a unichar[] array for that purpose, or you could construct adata structure similar to the one you've outlined above.
adam

That's good. I just installed 0.12.0. I'll test some code and add thisnew method to the tutorial page.Now string[i] returns a byte by offset i but string.get_char(i) returnsa character by offset i, if i is in the right position. I really wishthat the convenient [] operator be used in place of get_char().

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]