Re: New developments on Caribou

Francesco Fumanti wrote:
There is working ongoing to create a word prediction service over dbus for the onscreen keyboard onboard. (onboard is the default onscreen keyboard shipping with Ubuntu.) At some point, there was also talk to share it with Caribou. It uses n-grams language modeling. If you want to have a look at it, you can find it in the word completion branch of onboard:

I had a look at the onboard word-completion branch, great stuff!

I think there is scope to join forces between presage and onboard.

presage is architected to merge predictions generated by a set of predictors. Each predictor uses a different language model/predictive algorithm to generate predictions.

Currently presage provides the following predictors:
ARPA predictor: statistical language modelling data in the ARPA N-gram format generalized smoothed n-gram statistical predictor: generalized smoothed n-gram statistical predictor can work with n-gram of arbitrary cardinality
recency predictor: based on recency promotion principle
dictionary predictor: generates a prediction by returning tokens that are a completion of the current prefix in alphabetical order abbreviation expansion predictor: maps the current prefix to a token and returns the token in a prediction with a 1.0 probability dejavu predictor: learns and then later reproduces previously seen text sequences.

A bit more information on how these predictors work is available here:

It sounds like the language model and predictive algorithm used in the onboard word-prediction branch is an ideal candidate to be integrated into presage and become a new presage predictor class.

presage could then be the engine used to power the d-bus prediction service, offering the predictive capabilities of the onboard language model/predictor, plus all the predictors currently provided by presage (all of which can be turned on/off and configured to suit individual needs).

The presage core library itself has minimal dependencies: it pretty much only needs a C++ runtime and sqlite, which is used as the backing store for n-gram based language models (this ensure fast access, minimum memory footprint and no delays while loading the language model in memory).

For details about the word prediction service, please contact marmuta that did nearly all the work about the word prediction service.

I'll follow up with marmuta to discuss the feasibility of making this happen and work out the technical details, in case there is consensus to go ahead with this.

- Matteo

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]