Re: Fixing gnome-speech



> Festival is free software, so this is of course fixable.  Having looked
> at the code, it's simple code and it wouldn't break if it'd be stretched
> a bit.  But that's not improving a driver: that's improving festival (if
> the authors allow) and then having to depend on a very new version of
> it.

Hi Enrico,

also the problem with speech engines doing their own audio output
(apart from what you said about Festival) is that this audio output
needs to be configured at several places if several engines are used,
many places where code needs to be updated if a new audio technology
comes etc.

> [...]
> So the proper way to implement a festival driver seems to me to use the
> text-to-wave function and then do a proper handling of playing the
> resulting wave, hopefully using the audio playing technology that's
> trendy at the moment.

Yes, I agree. Actually this is what both Speech Dispatcher and KTTSD are
doing and I think I've heard Gnome Speech would also like to go this way
in the future.

> I looked into esd without understanding if it is
> trendy anymore, and I look at gstreamer without understanding if it
> isn't a bit too complicated as a default way to play a waveform.

This is fairly complicated. I've investigated into possibilities for
audio output and I've ended up sumarizing our requirements if such a
technology should eventually come in the future and writing my own
small library for output to OSS, Alsa and NAS. Please see
http://lists.freedesktop.org/archives/accessibility/2005-April/000049.html
and feel free to have comments. One of the problems is the latency we
need. That ruled out both ESD and Gstreamer at that time, I'm not sure
what is the state now with Gstreamer. Another thing is that if we are
aiming for a desktop independent speech technology, we need desktop
independent audio output.

> I don't know much about the APIs of other speech engines.  If they all
> had a text-to-wave function

Most of the engines do. Some don't, but this is their drawback (what if
I want to have the audio synthesized and save to a file?). As you said,
it is very desirable to retrieve the audio for those engines that
support it.

> , then it can be a wise move to implement a
> proper audio scheduler to share among TTS drivers, which could then
> (reliably) support proper integration with the audio system of the day,
> progress report, interruption and whatever else is needed.  This would
> ensure that all TTS drivers would have the same (hopefully high) level
> of reliability wrt audio output.

Yes, that is mine dream too! Would you be wiling to help with this?
I think we would first have to see what is new and consider the options
again.

> > Now, one of the big problems is that Festival doesn't offer proper logs.
> > It would often refuse connection for a stupid typo in the configuration
> > file and not give any clue to the user. This is something which should
> > be fixed.
> This can probably be fixed: festival can be told not to load any config
> file

This is not really useful. Configuration is really needed.

> , and log can be implemented adding a couple of printfs before calls
> to the C++ API. 

That is the log from the side of the speech api provider (Gnome Speech
etc.). This already exists in Dispatcher and as I said is automatic from
a TCP API. I was talking about logs on the side of Festival.

You will never be able to discover why a particular voice was not
loaded/doesn't work, why a sound icon is not playing, what is the typo
in your configuration files, why is it not finding a module (wrong path)
and such from just talking to Festival via its API (be it C++ or TCP).

Currently the only way for the users to fix such problems is to run
Festival from command line and hope it will write some cryptic message
to stderr. Then what is left are guesses, past experiences with problems
and black magic. We must be able to diagnose problems.

>> [from my earlier post]
>> Now, one of the big problems is that Festival doesn't offer proper
>> logs.

You say you find the Festival C code clear and modifications not
difficult. If this could be fixed, that would be superb. I don't think
Alan would object to include the patch. And it would not introduce
a dependency for us. I don't know however how soon it could get
into some official release. But I think it is worth looking into.

>  And something like a TTS driver which becomes the main
> form of access to the computer should be designed to properly restart in
> case of segfaults in its own code, be it festival or whatever else.

Yes, this is something we tried in Speech Dispatcher, but it doesn't
always work. We should get this part right in TTS API. The objection
that with the TCP API it is easier to see what part is crashing, after
which commands exactly, however remains.

With regards,
Hynek Hanke





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]