On Tue, Jun 27, 2006 at 05:04:37PM +0200, Hynek Hanke wrote: > * Enrico suggested we should use Festival C API instead of talking > to it via TCP. Also Olivier mentioned the whole chain to be too long and > source of troubles. However, I suspect the problem is not in the chain > being too long as much as in both Festival and Gnome Speech lacking > proper detailed logs. The problem that I've found in the Festival C API is that you cannot have reliable is_speaking testing / end-of-speech notification. Details: Festival can run in two modes: (audio_mode 'sync) or (audio_mode 'async). In sync mode, a (SayText "...") command would block the entire festival engine until the phrase has been fully spoken. That rules out being able to interrupt the speaking, so we don't want it. In async mode, festival runs an audio spooler called audsp as external process, then does the TTS converting text into waveforms, saves the waveforms in a file under /tmp [shivers] and tells audsp to play that file. audsp keeps listening to the pipe while playing, and supports commands like "wait until everything has been spoken" or "interrupt speaking and reset the queue". The communication protocol between festival and audsp is basically one-way, and there's currently no way for audsp to push info back to festival. This makes it impossible to notify that a wave has finished playing. There is also currently no way to ask if audsp is currently playing something or not. Festival is free software, so this is of course fixable. Having looked at the code, it's simple code and it wouldn't break if it'd be stretched a bit. But that's not improving a driver: that's improving festival (if the authors allow) and then having to depend on a very new version of it. So the proper way to implement a festival driver seems to me to use the text-to-wave function and then do a proper handling of playing the resulting wave, hopefully using the audio playing technology that's trendy at the moment. I looked into esd without understanding if it is trendy anymore, and I look at gstreamer without understanding if it isn't a bit too complicated as a default way to play a waveform. Also, not using audsp means that the festival driver wouldn't add another spawned process to keep track of. I don't know much about the APIs of other speech engines. If they all had a text-to-wave function, then it can be a wise move to implement a proper audio scheduler to share among TTS drivers, which could then (reliably) support proper integration with the audio system of the day, progress report, interruption and whatever else is needed. This would ensure that all TTS drivers would have the same (hopefully high) level of reliability wrt audio output. > Also, we have found the connection randomly crashes for no apparent > reason. It is indeed far better if we can just detect it, log it and > create a new connection and reset the parameters automatically (as we do > now) than if such a crash would bring down the whole module (if we were > using the C API) for no clear reason. (Another one: in the current > version of Dispatcher, sometimes a very mysterious segfault happens. > I suspect this has something to do with ALSA, but it is very hard to > tell as we link ALSA directly and the crash is not reproducible in > testing circumstances...) > > Now, one of the big problems is that Festival doesn't offer proper logs. > It would often refuse connection for a stupid typo in the configuration > file and not give any clue to the user. This is something which should > be fixed. This can probably be fixed: festival can be told not to load any config file, and log can be implemented adding a couple of printfs before calls to the C++ API. And something like a TTS driver which becomes the main form of access to the computer should be designed to properly restart in case of segfaults in its own code, be it festival or whatever else. Ciao, Enrico -- GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico debian org>
Attachment:
signature.asc
Description: Digital signature