I messed around with MaryTTS some years back. I even wrote a little
interface at one point that would allow the generic sd module to talk
through it. But I found it to be very cumbersome on most hardware, as it
had to talk to MaryTTS by sending an http request to the speech
synthesizer and playing the wav output that was returned.
I wrote a SSIP-subset command interpreter for it in Java, and it works very well.
I also have
found Java to be very very slow overall. There was also a strange
tendency for MaryTTS voices to suddenly raise their pitch to a very
squeaky high, although I didn't hear many of the English voices do that.
IIRC, I had to use libsonic to adjust pitch for MaryTTS. I don't think they had solved the pitch issues well at leaste back then.
I did hear bdl get rather tinny and oldschool at random times however,
sounding autotuned or like packet loss was occurring in an extreme
low-bitrate speech recording.
I had to train my own voices to get ones I liked, but they have some really nice ones now on their web demo.
Perhaps work could be done to integrate the already existing RHVoice
module into sd. I use RHVoice here every day all the time, and it sounds
smoother and runs with less resources than MaryTTS
Sounds good to me! I'd like to check it out.
Bill