Re: [g-a-devel]Gnome Speech Architecture Proposal - the IDL



Hi Draghi,

On Thu, 2002-05-16 at 11:54, Draghi Puterity wrote:
> here is a very crude draft of an IDL for my recent proposal of the
> Gnome-Speech architecture.

	I'll give some general comments as I go, hopefully I'll catch most
things.
 
>         boolean      registerMarkerListener (in string userCookie, 
> word flags, ??? callback);        

	Ok - it's not clear to me what you want this callback to do, but if you
want to have the interface send back data, you need to implement a
Bonobo::Listener interface, or (worse) your own custom interface on the
client to be able to receive the messages and return them to the caller.
I would reccommend aggregating a Bonobo::EventSource and a
Bonobo::Listener interface, since they use string names for events, and
do a fair bit for you - they are more generic, standard and
'understood'.

>         void         say (in string text);
>         void         shutUp();

	One thing worth knowing is that if you fire a void method off with no
exceptions, the app will block waiting for the result, you probably want
to add 'oneway' to any void method that doesn't return an exception -
then your app will fire and forget.

>         boolean      isSpeaking();

	Ok - what is this method for ? if you are expecting people to do:

	while (obj->isSpeaking ());
	obj->say ("Foo");

	You have built a race condition ;-) possibly you want to have:

	enum {
		SAY_OVERRIDE,
		SAY_OVERRIDE_CANCEL,
		SAY_IF_NOT_SPEAKING,
		SAY_FOO
	} HowToSay;

	boolean say (in string text, HowToSay howto)
		raises (WasSpeaking);

	Or somesuch - but perhaps that is too complicated for your needs.

>         ParameterList  getSupportedParameters ();

	I imaging a ParameterList is in fact a ParameterRangeList where:

	typedef sequence<ParameterRange> ParameterRangeList.

	The sequence will as you suggest encode the length.
 
>         any getParamaterValue (in string parameterName)
>             raises (ParameterNotSupported);
>                 
>         void setParameterValue (in string parameterName, any value) 
>             raises (ParameterNotSupported, ParameterOutOfRange,
> WrongValueType);

	If people will do a lot of sets, it might be worth binning the
exceptions here, since they're all calculable from the
ParameterRangeList, and making that oneway, or simple adding a oneway
variant to avoid roundtrips.

>         ParameterList    getSupportedParameters ();
>         ParameterRange   getParameterRange (in string ParameterName)
>             raises (ParameterNotSupported);

	Surely a (far more static) RangeList would be more useful as
getSupportedParameters ? that is unless you badly need a 'bulk fetch'
interface as well. I suppose one thing that interests me is who will use
these interfaces. Perhaps an interface is used only for 1 client, and a
fresh intance of that interface returned to each new client ? if not,
you may get applications fighting over these (or any) properties.

>         Speaker             getSpeaker (in string speakerName) 
>             raises (SpeakerNotSupported);
>         Voice               getVoice (in string voiceName) 
>             raises (VoiceNotAvailable);

	So these would return a new instance of that interface, which would
store that client's settings ?, and inter-instance interactions would be
dealt with in-proc in the sound server ?
 
>         Voice               getCurrentVoice ();
>         Voice               setCurrentVoice (in string voiceName)
>              raises (VoiceNotAvailable);    // returns the
> previously selected voice

	Would these two make sense in that model ?
 
>         void                 say (in string text);
>         void                 shutUp();
>         boolean              isSpeaking();

	I'm somewhat confused as to why we duplicate this functionality on the
voice and the other bits.

	As you can see, I'm also somewhat confused as to the role of the
voices, and the properties - are these just to be set by some control
panel type application ? or are they intended to be programatically
varied at high frequency [ per word ] ? in short, what are the
properties :-)

> As you see we have three objects: the SpeechManager, the Speaker,
> and the Voice.
>  
> The SpeechManager is the main entry point for the client. The client
> will ask the SM for a list of available speakers (I'm not sure if we
> need the SpeakerCount, or we can get that  from the SpeakerList).

	You can get it from the list (sequence).

>  getSpeaker will return a Speaker object if one is found with that
> name. I don't know if bonobo allows you that, but I have assumed so.

	Yes, you can return CORBA_OBJECT_NIL if there is none by that name.

> The Speaker object is here, to allow us to query for the avaialble
> parameter ranges. The Speaker can not speak ;-))). In order to
> speak, you need to ask the Speaker to create a Voice with
> createVoice. The client provides a name for the Voice (Speakears
> have predefined names!). createVoice will return you a Voice object
> with some default parameter settings. 

	Interesting.

> What can we do with a Voice? We can change the supported parameters
> in the allowed ranges, and ask it to say something, shut up, etc. We
> also need to be able to receive notifications as speech markers (at
> least EOS). For that I provided the registerMarkerListener, but I
> have some doubts that this is the right way. Doesn't bonobo support
> some sort of standard event generation mechanism, like COM's
> connection points? If yes, we should use that instead of my
> home-brewed registerMarkerListener;

	cf. BonoboListener, BonoboEventSource.
 
> So the client the client keeps track of the voice objects it
> created, and generates speech throug the Voice objects. Using the
> GNOME Speech like this it's very flexible, and it solves also a
> problem that might arise in the future, if we want to
> allow simultaneous speaking of multiple voices.

	Great.
 
> I would like to introduce also the "current voice" approach, as it
> is more convenient for some clients. At the SpeechManager level I
> have provided a get/setCurrentVoice pair and doubled the speech
> functions from the Voice object (say, shutUp, isSpeaking, etc).
> Tipically the client will produce once, at the begining, the voices
> it wants, and then just switch betwen them before saying something.
> We expect SRS in Gnopernicus to use this approach. Given the Voices
> objects the impelemntation of the CurrentVoice concept is almost
> trivial.

	The problem is muxing this between clients; it's not possible to tell
where an invocation came from, [ which client ]. If you add a client
interface handle to all the methods to 'say' it would be, but then you
might as well have the invocation on the voice.

	Thus if client a) selects voice 'Male1' and client b) selects voice
'Female2' who wins ? - but then perhaps it's too difficult to design
this to work well with multiple clients.

> I have also introduced getVoices at SpeechManager level. The ideea
> of making the voices global at this level comes from Peter's wish to
> allow Gnopernicus to interoperate with other self-voicing apps. With
> this, if GS will support multiple clients as a singleton, it is
> possible for a Gnopernicus aware self-voicing app to speak with a
> Gnopernicus defined voice. In this respect, it might be useful if we
> would also provide notification to the clients when Voices come and
> go.

	Hmm,

	Hope the comments help, as you can see I'm somewhat unclear as to the
entire purpose of it all :-)

	Regards,

		Michael.

-- 
 mmeeks gnu org  <><, Pseudo Engineer, itinerant idiot




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]