Glib::Regex and unicode chars



Hello,

I'm trying to implement syntax highlighting and it works perfectly until I input some special chars, in which fetch_pos method gives me number of bytes rather than characters. Let's take this regex object for example:

Glib::Regex::create(R"((?<word>class))", Glib::RegexCompileFlags::REGEX_OPTIMIZE);

fetch_pos method works perfectly on ascii text, but as soon as I prepend the string with any multibyte unicode character, fetch_pos gives me shifted values. (The shift is equal to test_string.bytes() - test_string.length()).

I could probably fix this manually by adjusting shift by the difference of bytes and length, but I'm sure that would not be efficient and I would have to look back in the buffer constantly.

Same goes for capturing keywords that have unicode characters themselves, for example capturing "clasś" (note the special character at the end) would result in fetch_pos giving range of 6 characters, when the word contains 5 characters (but has 6 bytes).

Maybe I'm not using it correctly, but it was said that Glib::Regex supports utf-8.

Thanks


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]