Re: [u-a-dev] gnome-speech, and audio output, moving forward.



HI Luke, Will, and all:

For what it's worth, I agree with the bulk of what's been said already. It will be fantastic to get some more sanity in the speech/audio arena.

As for the first item Will identifies as a 'proposal', namely relying on the TTS engine to return digital sound samples rather than doing the output itself, I think this is a great idea but I would just suggest looking carefully at the potential latency issues there.

Also, key requirements of any speech/audio integration API(s) include the ability to know, at least roughly, two pieces of information: what is currently in the output queue and approximately how close to completion it is, and the ability to "sync up" and actually know, at some point in time, exactly what has been spoken. These are subtly different, in that the second one requires information about completion as opposed to "approximate progress". I think the second one implies at least some degree of interrupt capability in the audio output stream as well. Use cases include audio/voice synchronization, braille synchronization, and (perhaps more importantly), the ability to reliably break an utterance into pieces and restart output at a known point.

As for moving away from Bonobo Activation (note; not the same as "Bonobo" in the broad sense), I think this makes sense. I also think moving away from the use of CORBA for gnome-speech IPC is a good idea; the speech APIs seem like excellent candidates for dBUS migration and we have very few, if any, platform bincompat guarantees to deal with as long as the consumers of the speech interfaces are kept in the loop.

Best regards,

Bill

Willie Walker wrote:
Hi Luke:

First of all, I say "Hear, hear!"  The audio windmill is something
people have been charging at for a long time.  Users who rely upon
speech synthesis working correctly and integrating well with the rest of
their environment are among those that need reliable audio support most
critically.

I see two main proposals in the below:

1) Modify gnome-speech drivers to obtain samples from their
   speech engines and then handle the audio playing themselves.
   This is different from the current state where the
   gnome-speech driver expects the speech engine to do all the
   audio management.

   This sounds like an interesting proposal.  I can tell you
   for sure, though, that the current gnome-speech maintainer
   has his hands full with other things (e.g., leading Orca).
   So, the work would need to come from the community.

2) As part of #1, move to an API that is pervasive on the system.
   The proposed API is GStreamer.

   Moving to a pervasive API is definitely very interesting, and
   I would encourage looking at a large set of platforms:  Linux
to Solaris, GNOME to KDE, etc. An API of recent interest is Pulse Audio (https://wiki.ubuntu.com/PulseAudio), which might
   be worth watching.  I believe there might be many significant
   improvements in the works for OSS as well.

In the bigger scheme of things, however, there is discussion of
deprecating Bonobo.  Bonobo is used by gnome-speech to activate
gnome-speech drivers.  As such, one might consider alternatives to
gnome-speech.  For example, SpeechDispatcher
(http://www.freebsoft.org/speechd) or TTSAPI
(http://www.freebsoft.org/tts-api-provider) might be something to
consider.  They are not without issue, however.  Some of the issues
include cumbersome configuration, reliability, etc.  I believe that's
all solvable with work.  The harder issue in my mind is that they will
introduce an external dependency for things like GNOME, and I've also
not looked at what their licensing scheme is.

Will




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]