Thoughts on speech [Fwd: Re: [orca-list] Speech in general (Re: Capital, Capital, Capital)]



Just forwarding this discussion as an FYI...
--- Begin Message ---
Just one more thing to add to this rather huge list, but one which comes up 
often, especially reading program source:

ThisStringShouldBeSpokenAsIfItWereSeparateWords.

Just my two cents.
-- Rich

----- Original Message ----- 
From: "Willie Walker" <William Walker Sun COM>
To: <orca-list gnome org>
Sent: Thursday, March 20, 2008 10:22 AM
Subject: [orca-list] Speech in general (Re: Capital, Capital, Capital)


Hi All:

I think it's important to take a step back and look at the overall
problem we're facing.  We have the desire for:

* Verbalized punctuation and capitalization
* Verbalized characters (e.g., 'double you' for 'w')
* Verbalized key names (e.g., 'Left Shift')
* Phrase spelling (letter-by-letter and 'military style')
* Customized pronunciation
* Abbreviation expansion
* Homograph disambiguation (the 'live' in 'Where do you live'
   vs. 'I live in cave')
* Natural F0 contour and prosody handling
* Audio cues
* Progress callbacks (e.g., 'this word was just spoken')
* Voice/pitch changes
* Multilingual support
* Etc.

A lot of these can be done at the speech layer without the need for
additional knowledge.  Things like voice/pitch changes based upon
context (e.g., it's a link, it's a pushbutton, etc.) may still need to
live at the screen reader level.  Locale knowledge may also need to live
a little higher in the stack.

We have a variety of speech synthesis engines with limited
standardization for any of the above, each of which does all or a subset
of the above in different ways.  SSML is an interesting thing, but only
a handful of engines really support it.  In addition, it really doesn't
provide support for things like 'say-as=military_spelling'.

We also have at least a couple kinds of users: 1) those that don't
necessarily care about the details of what the speech synthesis engine
supports and just want all of the above to work, 2) those that are more
aware of the TTS engine's capabilities and are willing to work with its
limitations.  Many of these same users are willing to pay money for a
high quality commercial engine and expect Orca to work perfectly with
that engine.

This is a pretty complex problem.  The solution we're currently working
with in Orca is that it handles a lot of the above.  Based upon the
discussion, I think we're agreeing there should be some delegation to
the lower layers.  With this delegation approach, if Orca can learn that
the lower layers support a feature, it can delegate responsibility for
that feature to the lower layer.  If a lower layer doesn't support it,
then Orca needs to provide it.

The first difficult task is figuring out how to obtain the information
to do the appropriate delegation.  There's no standardization across any
of the engines.  Take, for example, obtaining locale information.  The
programmatic representation of locale differs greatly from engine to engine.

Take, for example, verbalized punctuation.  The engines that support it
have their own ideas of 'none, some, all'.  I'll guarantee you that
delegating verbalized punctuation to the engine will result in at least
one member of this list shouting angrily that some punctuation mark was
spoken at the 'some' level with one engine, but not another.

As a way to move forward, I think we might need to fill out the desires
above and see what can be done to address them.  The TTSAPI work does
some of this, but I'll admit it was done before I had a better
understanding of the screen reader problem:
http://www.freebsoft.org/tts-api.

Will

_______________________________________________
Orca-list mailing list
Orca-list gnome org
http://mail.gnome.org/mailman/listinfo/orca-list
Visit http://live.gnome.org/Orca for more information on Orca

_______________________________________________
Orca-list mailing list
Orca-list gnome org
http://mail.gnome.org/mailman/listinfo/orca-list
Visit http://live.gnome.org/Orca for more information on Orca

--- End Message ---


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]