Re: Improve word boundaries for text widgets



On Sat, Oct 04, 2014 at 12:30:43PM -0400, Matthias Clasen wrote:
I mean that it is the right default behavior to follow Unicode TR 29.

If we want to follow Unicode TR 29, then we should use the
is_word_boundary PangoLogAttr attribute, see:
https://bugzilla.gnome.org/show_bug.cgi?id=530405

As I explained there, I've done some experiments with is_word_boundary,
and it is not convenient to use directly. Using is_word_boundary gives
probably better results for the whole-word search. But for word
selection and word movements, it's not convenient, another algorithm
on top of is_word_boundary would be needed to achieve good results (at
least as good as the current behaviors in GTK+ that use is_word_start
and is_word_end).

With is_word_start and is_word_end, we have two word boundary types. It
is needed for word movements: ctrl+right goes to the next word _end_,
ctrl+left goes to the previous word _start_. With is_word_boundary, we
don't know if a word boundary is at the start or end of a word.

For word selection (double click), one word boundary type is needed. It
is easy to go from two word boundary types to one, utilities functions
can be written. Doing the contrary is more difficult. So, since
is_word_boundary doesn't give good results for the word selection, using
is_word_start and is_word_end is easier (through the utilities
functions).

But, is_word_start and is_word_end are only for natural-language words.
It doesn't take into account the punctuation, that can also occur for
normal text. If we want to follow Unicode TR 29, we can see here:
http://www.unicode.org/reports/tr29/#Word_Boundaries

that there are word boundaries for punctuation too. It is explained that
word boundaries can be tailored for some features. For example the
whole-word search can have different word boundaries than word
selection.

And I think the Vim word boundaries as used by the 'e' and 'b' commands
are suitable both for normal text and code. It would be a consistent
behavior across all GTK+ text widgets, including GtkSourceView for the
code.

Fixing internal inconsistencies is a good idea, of course. Patches for
that would be most welcome.

For tayloring the text segmentation behavior for special situations,
such as code instead of natural language, a vfunc is the right
approach.

Does that make clear what I would like to see ?

But the Vim word boundaries improve the behavior also for normal text,
not just code. Vim can be used to write mails and documents.

--
Sébastien


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]