[orca-list] Speech in general (Re: Capital, Capital, Capital)



Hi All:

I think it's important to take a step back and look at the overall problem we're facing. We have the desire for:

* Verbalized punctuation and capitalization
* Verbalized characters (e.g., 'double you' for 'w')
* Verbalized key names (e.g., 'Left Shift')
* Phrase spelling (letter-by-letter and 'military style')
* Customized pronunciation
* Abbreviation expansion
* Homograph disambiguation (the 'live' in 'Where do you live'
  vs. 'I live in cave')
* Natural F0 contour and prosody handling
* Audio cues
* Progress callbacks (e.g., 'this word was just spoken')
* Voice/pitch changes
* Multilingual support
* Etc.

A lot of these can be done at the speech layer without the need for additional knowledge. Things like voice/pitch changes based upon context (e.g., it's a link, it's a pushbutton, etc.) may still need to live at the screen reader level. Locale knowledge may also need to live a little higher in the stack.

We have a variety of speech synthesis engines with limited standardization for any of the above, each of which does all or a subset of the above in different ways. SSML is an interesting thing, but only a handful of engines really support it. In addition, it really doesn't provide support for things like 'say-as=military_spelling'.

We also have at least a couple kinds of users: 1) those that don't necessarily care about the details of what the speech synthesis engine supports and just want all of the above to work, 2) those that are more aware of the TTS engine's capabilities and are willing to work with its limitations. Many of these same users are willing to pay money for a high quality commercial engine and expect Orca to work perfectly with that engine.

This is a pretty complex problem. The solution we're currently working with in Orca is that it handles a lot of the above. Based upon the discussion, I think we're agreeing there should be some delegation to the lower layers. With this delegation approach, if Orca can learn that the lower layers support a feature, it can delegate responsibility for that feature to the lower layer. If a lower layer doesn't support it, then Orca needs to provide it.

The first difficult task is figuring out how to obtain the information to do the appropriate delegation. There's no standardization across any of the engines. Take, for example, obtaining locale information. The programmatic representation of locale differs greatly from engine to engine.

Take, for example, verbalized punctuation. The engines that support it have their own ideas of 'none, some, all'. I'll guarantee you that delegating verbalized punctuation to the engine will result in at least one member of this list shouting angrily that some punctuation mark was spoken at the 'some' level with one engine, but not another.

As a way to move forward, I think we might need to fill out the desires above and see what can be done to address them. The TTSAPI work does some of this, but I'll admit it was done before I had a better understanding of the screen reader problem: http://www.freebsoft.org/tts-api.

Will




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]