Re: [u-a-dev] gnome-speech, and audio output, moving forward.

From: Willie Walker <William Walker Sun COM>
To: Ubuntu Accessibility development discussions <ubuntu-accessibility-devel lists ubuntu com>
Cc: GNOME Accessibility Developers <gnome-accessibility-devel gnome org>, Gnome Accessibility List <gnome-accessibility-list gnome org>, Orca screen reader developers <orca-list gnome org>
Subject: Re: [u-a-dev] gnome-speech, and audio output, moving forward.
Date: Tue, 18 Sep 2007 09:09:05 -0400

Hi Luke:

First of all, I say "Hear, hear!"  The audio windmill is something
people have been charging at for a long time.  Users who rely upon
speech synthesis working correctly and integrating well with the rest of
their environment are among those that need reliable audio support most
critically.

I see two main proposals in the below:

1) Modify gnome-speech drivers to obtain samples from their
   speech engines and then handle the audio playing themselves.
   This is different from the current state where the
   gnome-speech driver expects the speech engine to do all the
   audio management.

   This sounds like an interesting proposal.  I can tell you
   for sure, though, that the current gnome-speech maintainer
   has his hands full with other things (e.g., leading Orca).
   So, the work would need to come from the community.

2) As part of #1, move to an API that is pervasive on the system.
   The proposed API is GStreamer.

   Moving to a pervasive API is definitely very interesting, and
   I would encourage looking at a large set of platforms:  Linux
   to Solaris, GNOME to KDE, etc.  An API of recent interest is 
   Pulse Audio (https://wiki.ubuntu.com/PulseAudio), which might
   be worth watching.  I believe there might be many significant
   improvements in the works for OSS as well.

In the bigger scheme of things, however, there is discussion of
deprecating Bonobo.  Bonobo is used by gnome-speech to activate
gnome-speech drivers.  As such, one might consider alternatives to
gnome-speech.  For example, SpeechDispatcher
(http://www.freebsoft.org/speechd) or TTSAPI
(http://www.freebsoft.org/tts-api-provider) might be something to
consider.  They are not without issue, however.  Some of the issues
include cumbersome configuration, reliability, etc.  I believe that's
all solvable with work.  The harder issue in my mind is that they will
introduce an external dependency for things like GNOME, and I've also
not looked at what their licensing scheme is.

Will

On Tue, 2007-09-18 at 22:22 +1000, Luke Yelavich wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Greetings all.
> For a while now, it has been possible to have multiple audio streams playing at the same time, using ALSA's 
> dmix plugin under Linux. This also has meant the ability to have speech audible at the same time as other 
> audio. Users have desired the ability to do this for a while now, particularly since it has been possible in 
> other operating systems for a long time.
> 
> Since eSpeak has been developed, we have had a very usable synthesizer for speech output, which supports a 
> growing number of languages. Since this synthesizer is cross-platform, the choice was made by the author to 
> use PortAudio, thereby supporting all platforms where PortAudio is available. Since PortAudio v19, it has been 
> possible to use Alsa for audio output via PortAudio. In theory, this is good news, however in practice, this 
> has created more problems than it should solve, for the following reasons, as far as I see things:
> 
> * PortAudio v19 has had no official release, and so seems to be in a rather constant state of flux, making it 
> difficult for distros to reliably support a working version.
> * PortAudio's alsa implementation seems to currently be broken, which is evident while using eSpeak, and 
> attempting to speak multiple strings of text rapidly over a short period of time.
> * As far as I've seen, there is no easy way for the user to select which output device portaudio should use. 
> Added to that, if more than one app is using portaudio, this will affect that application as well as espeak, 
> which may not be what the user desires.
> * All proprietary synths only support oss output, which makes simultaneous audio and speech currently 
> impossible.
> 
> What I would like to propose, is the following. Since a large porshion of GNOME's multimedia framework is now 
> using gStreamer, I would like to suggest that we make all gnome-speech drivers use gStreamer, and if possible, 
> add another option to the sound preferences, to allow the user to select which soundcard they wish to use for 
> speech output. This would result in gstreamer being used via Alsa on Linux, thereby allowing simultaneous 
> audio and speech, which would likely happen at the gstreamer level before it even reaches alsa. (I don't 
> really know how gstreamer works, so this is a guess on my part.)
> 
> - From what I have seen, just about all proprietary synth APIs support sending audio data from the synth back to 
> the calling application, thereby allowing the audio to be sent whereever the application wishes. I am well 
> aware that gnome-speech was initially designed to not care about how the audio was played, but since its 
> initial inclusion in GNOME, gstreamer has become the standard multimedia framework for GNOME, and at least in 
> Ubuntu's implementation, allows the user to set different devices for several different uses, such as sound 
> events, music and movies, and audio/video conferencing.
> 
> I think we owe users the ability to use speech alongside audio, and offer it in an easy to use way, thereby 
> putting full control in their hands. Now that we are at the beginning of a new GNOME release, I personally 
> think its time to get serious about offering users a deacent screen reader and speech experience, the same, if 
> not better than what other operating systems offer.
> 
> I have sent this post to these lists, to try and get as wide a viewpoint, and discussion as possible. I would 
> appreciate any replies to be sent to all lists, to ensure everybody can participate in the discussion.
> 
> I would like to invite both users and developers to express their views on a matter which I believe needs 
> resolving. Input from gnome devs, particularly those for gnome-speech is very much welcome.
> 
> So, lets sort something out.
> - -- 
> Luke Yelavich
> GPG key: 0xD06320CE 
> 	 (http://www.themuso.com/themuso-gpg-key.txt)
> Email & MSN: themuso themuso com
> Jabber: themuso jabber org au
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> 
> iD8DBQFG78MHjVefwtBjIM4RAmvvAKCHJH5ZlcpwSwweLV9a/1mMJMXQHQCfTdtH
> WXhAp+9KaQv85VOYyGKmtYw=
> =46d4
> -----END PGP SIGNATURE-----
> 
> -- 
> Ubuntu-accessibility-devel mailing list
> Ubuntu-accessibility-devel lists ubuntu com
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-accessibility-devel

Follow-Ups:
- Re: [u-a-dev] gnome-speech, and audio output, moving forward.
  - From: Bill Haneman

References:
- gnome-speech, and audio output, moving forward.
  - From: Luke Yelavich

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]