Issues with Pango word separation



GTK+ text widgets are very useful, but GtkSourceView can be annoying while writing code, and I believe it is due to Pango's word break algorithms.

For example, when hitting Ctrl+Backspace (delete previous word) on a line like this, with the cursor at the end,

String msg1

most text editors treat alphanumeric characters equally and will delete "msg1", leaving the cursor at the end of "String", but in gedit, Anjuta, or any other GtkSourceView-based editor, it will delete only "1" and leave the cursor at the end of "msg".

Another example is when deleting the previous word from a blank line under a statement:

first statement;
<cursor>

In many programming languages, the semicolon is used as a statement separator and therefore word separator, so it's rather frustrating when the result ends up being this:

first<cursor>

The ideal result would be to delete the newline and stop at the semicolon, which has greater significance in writing code than it does writing in a natural language.

To be honest, I don't understand why Pango ever treats the transition from alpha to numeric characters or vice versa as a word break; in a previous job I used Tomboy to help keep track of serial numbers, which very often mixed letters and numbers, effectively making "delete previous word" a much more confusing action than it has any right to be. But it's quite a bit more common and problematic while writing code.

Here are a couple suggestions I have:

  1. Treat letters and numbers equally. Use whitespace and select punctuation for word separation, not alpha-numeric transitions.
  2. Create a new language engine for writing code? I feel like GtkSourceView has very different priorities than GtkTextView, and its algorithms should reflect that.
  3. Support custom word separators. Not being very familiar with Pango's existing word break algorithm, I'm not sure how easy this would be, but it would be fantastic if GtkSourceView lang files could specify what that programming language considered to be a word separator, like jEdit already does with its mode files (e.g. Java uses the list ,+-=<>/?^&*).
Let me know what you think. Any help, suggestions, or concerns would be appreciated.

Thanks,
~Damien


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]