[g-a-devel]Gnome Speech Architecture Proposal

From: "Draghi Puterity" <mp baum de>
To: "Marc Mulcahy" <marc mulcahy sun com>
Cc: <gnome-accessibility-devel gnome org>, <firm baum ro>
Subject: [g-a-devel]Gnome Speech Architecture Proposal
Date: Mon, 13 May 2002 11:28:44 +0200

Hi Marc, hi All,

we discussed a lot in our team about the GS arhitecture and this is how we
think it could look like:

The most important ideea is that we should hide the TTS engines from the GS
clients. Instead of exposing the TTS engines with their subseqent "voices"
to the GS clients we suggest to introduce the concept of "Speaker". The GS
would expose at its highest level only the Speaker objects to its clients.
Speaker examples are kal_diphone (Festival voice) or Perfect Paul or
Beautiful Betty (DECTalk Express voices),  the ViaVoice male voice #5, etc.
For the GS client, it shouldn't matter where or what these Speakers are.
These are just some entities that can produce speech output.

A Voice is a Speaker "instantiated" with a number of parameters (I know that
"voice" is a heavily overloaded term but I couldn't find something better
yet). An example of voice would be "a slow spoken Beautiful Betty". So,
Speakers describe the properties available for instantiating a voice (i.e.
pitch range, rate range, language supported (enum), etc). Voices can
actually speak, pause, resume, shut-up, and have "current values" for the
parameters.

A tipical usage scenario would be:

- ask the bonobo infrastructure for a GS object
- query the GS object for the Speakers available in the system
- pick one Speaker object and ask it to create a Voice object with some
given parameters.
- ask the Voice object to say something and receive its markers
...

All the plumbing like starting other servers, initialize devices and TTS
engines should be internal to the GS. The GS shouldn't be aware about this
implementation details.

There are many other issues that can be discussed here (i.e. the optimal
balance between Speaker properties and Voices, concurent speaking voices,
multiclient issues, etc), but I would leave this for later discussions if
you consider that we should follow this architecture.

Best regards,
Draghi

Follow-Ups:
- Re: [g-a-devel]Gnome Speech Architecture Proposal
  - From: David Bolter

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]