Hi Peter: Sounds like you are doing some fun work! Robert Brewer has done some work in this space with his SpeechLion
that of a speech synthesis engine: it's a service that can be used by assistive technologies. The speech synthesis problem is a bit simpler because the interface between the engine and the assistive technology is not so complex. Speech recognition requires perhaps a bit more complexity because of the typical need to tell the engine to listen for different grammars as well as the high degree of two-way communication between the speech app and the speech engine. It's a solvable problem, however, and emerging standards such as MRCPv2 are addressing this.
various folks who run down this path. My personal opinion is that turning speech into keyboard events is a potentially workable path, but I believe much more compelling access can be done via higher level access to the application, such as the AT-SPI. As one goes further down the speech input path, one starts to realize that speech recognition is not perfect. As such, one needs to start tuning/modifying the speech engine and the grammars it uses to squeeze the best accuracy/performance out of the engine. Really good tuning can be done by understanding just what utterances are acceptable input to the application based on its given state. This understanding can be better obtained by something such as the AT-SPI. Furthermore, once users can start talking to an application, they start expecting more than just "speech buttons." For example, one might want to be able to say "change the current selection to 12 point bold helvetica." This involves a plurality of UI operations. While this might be able to be done by injecting a sequence of well known keyboard events, direct semantic access via something such as the AT-SPI might be a better way to go. In any case, it sounds like you are getting pretty interested in this space, and I'd be excited to hear more about your progress! Will |