Re: Thoughts on speech

From: Tomas Cerha <cerha brailcom org>
To: Willie Walker <William Walker Sun COM>
Cc: David Bolter <dtb gnome org>, Gnome Accessibility List <gnome-accessibility-list gnome org>
Subject: Re: Thoughts on speech
Date: Mon, 10 Mar 2008 10:18:03 +0100

Hello all,

good to get to a constructive discussion!

Ok, so about Speech Dispatcher... as the name may suggest, its most
important function is message dispatching -- management, synchronization
and serialization of speech requests comming simultaneously from
different sources within the system.  These "sources" are typically
different assistive technologies, but it is possible (and encouraged),
to create multiple connections even from one AT and make use of this
synchronization for speech requests coming from different components of
this AT.  Orca currently uses just one connection, but Speechd-el, for
instance, makes use of this quite heavily.  The interaction of different
messages is controlled via their classification -- each message can be
assigned a "priority" which controls how it interacts with other
messages.  Context is maintained for each client connection independently.

The following ATs currently support Speech Dispatcher:
  * Orca,
  * LSR,
  * Emacs with Speechd-el,
  * speakup (console screen reader),
  * Yasr (terminal screen reader).

Thus when Orca is set to use Speech Dispatcher, it can coexist
peacefully with those other ATs (when using Gnome Speech, Orca conflicts
with other ATs since it assumes it has exclusive control over the synth).

Speech Dispatcher does Audio output itself instead of relying on the
synthesizer.  It currently includes support for OSS, ALSA, NAS and
PulseAudio.  The output method, output device and other options are
configurable per synth.  Audio management proved to be very important
design decision -- synthesizers often provide very limited support and
passing the sound through dispatcher makes other nice features possible,
 such as caching, which may improve responsiveness dramatically (think
of keyboard echo).

There are output drivers for:
    * Festival
    * Flite
    * Espeak
    * Cicero
    * IBM TTS
    * Epos (generic driver)
    * DecTalk software (generic driver)

The "generic driver" makes it possible to connect to any synthesizer
using its command line interface.  Driver can be written in 5 minutes
with no programming needed.  Of course, this is a quick hack, but works
surprisingly well.

Callbacks are supported at the utterance level (begin, end, interrupt).
 Callbacks inside the utterance are possible through SSML index marks
(if the engine supports it).

SSML can be used to include additional markup within the utterance.
SSML can be passed to the synth if it supports it or interpreted by the
output module.

The primary means of communication with Speech Dispatcher is a TCP/IP
connection and the SSIP protocol (see the specification at
http://www.freebsoft.org/doc/speechd/ssip.html).  Native implementations
of this protocol exist in a variety of programming languages.

Here is the complete list of current client interfaces:
    * C/C++ client library
    * Python client library
    * Emacs Lisp client library
    * Common Lisp client library
    * Guile client library
    * Simple command line client

Speech Dispatcher is highly configurable.  It can be run as a
system-wide service or within the user's session.  Per client and per
output module configurations are supported.  The configuration is read
from files, but it is not meant to be directly manipulated by end users.
 The service should typically be pre-configured by the distribution
developers.  User preferences, on the other hand, will be typically
managed by the client.

Speech Dispatcher has extensive support for logging and debugging at
many levels (client communication, message dispatching process,
configuration, output module communication, ...).

Speech Dispatcher is continuously developed for more than 6 years now.
Its development is financed by the non-profit organization Brailcom
(http://www.brailcom.org) as one of the key projects of the Free(b)soft
project (http://www.freebsoft.org).  It is licensed under GPL with
necessary components under LGPL.  A number of enhancements  were
contributed by the community of developers from other projects in the
spirit of free software development model.

More information including very comprehensive documentation can be found
at the project website: http://www.freebsoft.org/speechd.

Best regards

Tomas Cerha

References:
- Re: Thoughts on speech
  - From: Willie Walker
- Re: Thoughts on speech
  - From: David Bolter
- Re: Thoughts on speech
  - From: Willie Walker

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]