Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca



Hi Jan:

These are all very good points. To move forward, we need to find a constructive path to get past the problems.

As mentioned before, I believe the problem we face is that the lower speech layers don't support all the features desired. As such, some work will have to be done in various layers to accommodate those speech engines that simply do not have the features that a user demands. In http://mail.gnome.org/archives/orca-list/2008-March/msg00566.html, I provided a list of common desires people have for speech and Rich Caloggero added the notion to AutomaticallySpeakCamelCaseWordsAsSeparateWords.

To move forward, I think we need to take a look at various APIs and speech systems out there (e.g., gnome-speech, speech dispatcher, TTSAPI) as well as the various speech engines available to us (e.g., Festival, eSpeak, DECtalk, IBMTTS, Loquendo, Cepstral, etc.) and develop a matrix of what desired user features are supported where, including locale coverage. We also need to look markup such as SSML to see how far it can take us. From there, I think we can get a better understanding of where to go.

I think starting with a simple user requirements vs. speech engine matrix would be a great start. In the speech dispatcher space, have you done any such work that we could draw from?

Will

On Mar 31, 2008, at 8:40 AM, Jan Buchal wrote:

Good morning,

I've been watching debates about the intended working of punctuation,
capitalization and pronunciation of certain characters, acronyms and
words in general for already some time now.

Because very much has been written and said already, I'll try to be very brief but also sharp in my statement. My goal is to reach an improvement
of Orca, not to criticize it with no purpose.

First, I'll say that Orca currently solves the issues of punctuation,
capitalization and pronunciation in a wrong way. Orca does things that
do not belong in it at all and this only complicates the whole matter
and brings in more and more problems.

More concretely, Orca can't substitute one string for another before
sending the text to the synthesizer. Orca must only send a clear request
(via the API or SSML or in a different way) saying that for example if
the sentence contains fullstops, question marks and other punctuation,
these symbols should be announced in speech (or in a different way). It
is then up to the synthesizer to read such information in a neutral way,
not as a part of the sentence. This is a simple example showing how the
current implementation has been wrong. It is a big difference whether
Orca sends a sentence to the synthesizer containing strings like "coma"
and "question mark" or whether it only sends a request to announce these
symbols using speech or for example a sound icon.

The current implementation totally disturbs sentence prosody, thus the
intonation curve, different pauses etc. -- all that is very important
for the naturalness and intelligibility of the synthesized text.

A different case is the pronunciation of names, words or abbreviations.
Again, it is not a task for Orca, to create any dictionary of exceptions
and modify the incoming string exactly as it should be pronounced. Not
Orca, but the application itself (via at-spi) can mark such pieces of
text to be read in a particular way. For example it can specify via SSML
how to pronounce a certain name. Orca should only take this information
and pass it to the synthesizer, which should understand it and pronounce
it. A dictionary of general exceptions can be a part of the synthesizer
or speech synthesis system.

The situation is also very similar in the case of lower and upper case
letters. Orca should only specify, whether it wants the synthesizer to
present the character as capital or not and it is up to the user and his configuration options whether it should be announced verbally, via voice
change, using sound icons or in a different way.

These are just small examples of how Orca currently works and how it
should not work. They are examples of the old and of the new approach to
how the whole system of "reading the screen" should work. It is
important to consider that Orca is not really a screen reader how we
know the term from the past, but that Orca cooperates with at-spi and is
entirely dependent on whether even for the handicaped, the applications
present well the information that is important. Orca must also count
with the speech synthesis system.

Orca is an excellent project that could be much further if we didn't
unnecessarily complicate it. Let Orca do what it can do well and lets it
not do things that make it difficult to maintain and bring a lot of
bugs.

It should be clear that it is not possible to describe everything to
details in such an email. Therefore I'd like it to be clear that my main
point is that this is a principal problem of Orca, of its design, or
more precisely only of a part of it. If this thing is changed, it will
bring significant simplification and improvement to all and a relief to
those who work on AT.


Best regards


--

Jan Buchal
Tel: (00420) 24 24 86 008
Mob: (00420) 608023021

_______________________________________________
Orca-list mailing list
Orca-list gnome org
http://mail.gnome.org/mailman/listinfo/orca-list
Visit http://live.gnome.org/Orca for more information on Orca




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]