Hey all,

Promise, this will be the last I'll write on this for a while - but
people suggested I look at bi-gram, tri-gram and up to 5-gram
distributions of words found in GNOME software messages. That is, given
the input message :

"This is a long sentence so there."

We'd get the tri-grams :

This is a
is a long
a long sentence
long sentence so
sentence so there

- usually people mess about with this sort of thing when trying to build
terminology lists, but I suspect my sample set is a bit small to be

Regardless, I've got results at

