Re: [g-a-devel]Another draft of gnome-speech IDL



Hi All:

Thanks for the new IDL Marc, and Michael thanks for your comments.  

I may agree with Michael about the initialize stuff; however
initialization *is* expensive.  The difficulty is, what (if anything)
useful things can the user do with/find out about an uninitialized
driver?  If the answer is basically 'nothing' then I agree with Michael
that having explicit initialize/deinitialize methods doesn't seem to
serve much point, and it requires almost everything to throw those
annoying DriverNotInitialized exceptions.

However I think Michael is right when he wonders about the 'modal'
nature of "setCurrentVoice".  Though I agree with Marc that Draghi's
original "speaker-based" proposal seemed very hard to implement, I think
that the concept of a speaker or voice within a driver makes a lot of
sense.  So my suggestion would be to add 

getDefaultSpeaker ()
setDefaultSpeaker (Speaker s)

to SynthesisDriver, and define Speaker to take over many of the speech
methods, along with a createSpeaker () that took a "voice" string.

(I omit the "raises" specification from my examples below)

interface Driver : Bonobo::Unknown {

  ...

  Speaker createSpeaker (in string voiceName);
  void    freeSpeaker (in Speaker speaker);
}

interface Speaker : Bonobo::Unknown {
  ParameterList getSupportedParameters ();  
  string getParameterValue (in string name);
  void setParameterValue (in string name, any value);
  string getParameterValueDescription (in string name, any value);
  void say (in string text);
  void sayURI (in string uri);
  void stop ();
  void pause ();
  void resume ();
  boolean isSpeaking ();
  void wait ();
  registerSpeechEventListener (in SpeechEventListener l);
}

This would still mean that client would have to activate gnome-speech on
a "driver" basis, but clients could define and interact with multiple
"speakers" from a given driver (or separate drivers, of course). 
Speakers would be bound to a given driver, but their settings would
persist during a session so that, for instance, if a client wished to
output two different strings with different voices it would not be
necessary to sequentially interleave calls to "setCurrentVoice",
"setParameterValue", etc.

I think this could be implemented in phases fairly easily, so that for
instance a TTS service like festival could provide its voicelist, a
client could manipulate parameters on a "speaker" based on one of those
voices, and then use the "speakers" as persistent objects; in the case
of a driver like festival which doesn't currently persist settings
per-voice, the parameters would be cached in gnome-speech driver
structures to produce the effect of concurrently-available multiple
voices/speakers.

The simplest clients could still activate a gnome-speech service by
getting an instance of GNOME/Speech/SynthesisDriver, calling
getDefaultSpeaker() or createSpeaker ("voice"), then 
thus (using C++/Java-like shorthand rather than the C bindings):

  speaker = driver.getDefaultSpeaker ();

  speaker.say ("hello");

  /* lots more calls, until client is done */

  speaker.unref ();


Of course in C the code would actually look more like

  GNOME_Speech_Speaker *speaker =   
    GNOME_Speech_SynthesisDriver_getDefaultSpeaker (driver,
	"kaldiphone", &ev);

  GNOME_Speech_Speaker_say (speaker, 
	"hello", &ev);

  /* many more calls to the Speech service */

  GNOME_Speech_Speaker_unref (speaker, &ev);



best regards,

Bill





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]