Re: Fixing gnome-speech



On Tue, Jun 27, 2006 at 05:04:37PM +0200, Hynek Hanke wrote:

> * Enrico suggested we should use Festival C API instead of talking
> to it via TCP. Also Olivier mentioned the whole chain to be too long and
> source of troubles. However, I suspect the problem is not in the chain
> being too long as much as in both Festival and Gnome Speech lacking
> proper detailed logs.

The problem that I've found in the Festival C API is that you cannot
have reliable is_speaking testing / end-of-speech notification.

Details: Festival can run in two modes: (audio_mode 'sync) or
(audio_mode 'async).

In sync mode, a (SayText "...") command would block the entire festival
engine until the phrase has been fully spoken.  That rules out being
able to interrupt the speaking, so we don't want it.

In async mode, festival runs an audio spooler called audsp as external
process, then does the TTS converting text into waveforms, saves the
waveforms in a file under /tmp [shivers] and tells audsp to play that
file.  audsp keeps listening to the pipe while playing, and supports
commands like "wait until everything has been spoken" or "interrupt
speaking and reset the queue".

The communication protocol between festival and audsp is basically
one-way, and there's currently no way for audsp to push info back to
festival.  This makes it impossible to notify that a wave has finished
playing.  There is also currently no way to ask if audsp is currently
playing something or not.

Festival is free software, so this is of course fixable.  Having looked
at the code, it's simple code and it wouldn't break if it'd be stretched
a bit.  But that's not improving a driver: that's improving festival (if
the authors allow) and then having to depend on a very new version of
it.

So the proper way to implement a festival driver seems to me to use the
text-to-wave function and then do a proper handling of playing the
resulting wave, hopefully using the audio playing technology that's
trendy at the moment.  I looked into esd without understanding if it is
trendy anymore, and I look at gstreamer without understanding if it
isn't a bit too complicated as a default way to play a waveform.
Also, not using audsp means that the festival driver wouldn't add
another spawned process to keep track of.

I don't know much about the APIs of other speech engines.  If they all
had a text-to-wave function, then it can be a wise move to implement a
proper audio scheduler to share among TTS drivers, which could then
(reliably) support proper integration with the audio system of the day,
progress report, interruption and whatever else is needed.  This would
ensure that all TTS drivers would have the same (hopefully high) level
of reliability wrt audio output.


> Also, we have found the connection randomly crashes for no apparent
> reason. It is indeed far better if we can just detect it, log it and
> create a new connection and reset the parameters automatically (as we do
> now) than if such a crash would bring down the whole module (if we were
> using the C API) for no clear reason. (Another one: in the current
> version of Dispatcher, sometimes a very mysterious segfault happens.
> I suspect this has something to do with ALSA, but it is very hard to
> tell as we link ALSA directly and the crash is not reproducible in
> testing circumstances...)
> 
> Now, one of the big problems is that Festival doesn't offer proper logs.
> It would often refuse connection for a stupid typo in the configuration
> file and not give any clue to the user. This is something which should
> be fixed.

This can probably be fixed: festival can be told not to load any config
file, and log can be implemented adding a couple of printfs before calls
to the C++ API.  And something like a TTS driver which becomes the main
form of access to the computer should be designed to properly restart in
case of segfaults in its own code, be it festival or whatever else.


Ciao,

Enrico

-- 
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico debian org>

Attachment: signature.asc
Description: Digital signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]