Re: [g-a-devel]Gnome Speech Architecture Proposal - the IDL
- From: Marc Mulcahy <marc mulcahy sun com>
- To: Michael Meeks <michael ximian com>, Draghi Puterity <mp baum de>
- Cc: accessibility mailing list <gnome-accessibility-devel gnome org>, Thomas Friehoff <tf baum de>
- Subject: Re: [g-a-devel]Gnome Speech Architecture Proposal - the IDL
- Date: Mon, 20 May 2002 23:48:15 -0600
Michael, Draghi and all:
I'm not sold on this "speaker-based" approach which Draghi proposes as of
yet. as Michael's mail points out, it seems to confuse the purpose of each
individual component of the system. Beyond making it easier for
Gnopernicus to implement, what does this approach provide that the original
engine-based API did not? Something tells me that if you want to create
this idea of global voices, each having an associated engine and
user-defined voice name, then this belongs in Gnopernicus not in
gnome-speech. I fear that hiding control over the individual engines may
come back to byte us in the future...
Using a speaker-based approach will also make installing individual engine
drivers more difficult. How does the speech manager find the installed
engines and speakers? If it simply wraps bonobo-activation to do this, and
talks to a SynthesisDriver interface underneath, which is what I fear would
be necessary, then we may as well eliminate layers and just expose
SynthesisDriver as in the original proposal.
Marc
At 10:23 AM 5/20/2002 +0100, Michael Meeks wrote:
Hi Draghi,
On Thu, 2002-05-16 at 11:54, Draghi Puterity wrote:
> here is a very crude draft of an IDL for my recent proposal of the
> Gnome-Speech architecture.
I'll give some general comments as I go, hopefully I'll catch most
things.
> boolean registerMarkerListener (in string userCookie,
> word flags, ??? callback);
Ok - it's not clear to me what you want this callback to do, but
if you
want to have the interface send back data, you need to implement a
Bonobo::Listener interface, or (worse) your own custom interface on the
client to be able to receive the messages and return them to the caller.
I would reccommend aggregating a Bonobo::EventSource and a
Bonobo::Listener interface, since they use string names for events, and
do a fair bit for you - they are more generic, standard and
'understood'.
> void say (in string text);
> void shutUp();
One thing worth knowing is that if you fire a void method off with no
exceptions, the app will block waiting for the result, you probably want
to add 'oneway' to any void method that doesn't return an exception -
then your app will fire and forget.
> boolean isSpeaking();
Ok - what is this method for ? if you are expecting people to do:
while (obj->isSpeaking ());
obj->say ("Foo");
You have built a race condition ;-) possibly you want to have:
enum {
SAY_OVERRIDE,
SAY_OVERRIDE_CANCEL,
SAY_IF_NOT_SPEAKING,
SAY_FOO
} HowToSay;
boolean say (in string text, HowToSay howto)
raises (WasSpeaking);
Or somesuch - but perhaps that is too complicated for your needs.
> ParameterList getSupportedParameters ();
I imaging a ParameterList is in fact a ParameterRangeList where:
typedef sequence<ParameterRange> ParameterRangeList.
The sequence will as you suggest encode the length.
> any getParamaterValue (in string parameterName)
> raises (ParameterNotSupported);
>
> void setParameterValue (in string parameterName, any value)
> raises (ParameterNotSupported, ParameterOutOfRange,
> WrongValueType);
If people will do a lot of sets, it might be worth binning the
exceptions here, since they're all calculable from the
ParameterRangeList, and making that oneway, or simple adding a oneway
variant to avoid roundtrips.
> ParameterList getSupportedParameters ();
> ParameterRange getParameterRange (in string ParameterName)
> raises (ParameterNotSupported);
Surely a (far more static) RangeList would be more useful as
getSupportedParameters ? that is unless you badly need a 'bulk fetch'
interface as well. I suppose one thing that interests me is who will use
these interfaces. Perhaps an interface is used only for 1 client, and a
fresh intance of that interface returned to each new client ? if not,
you may get applications fighting over these (or any) properties.
> Speaker getSpeaker (in string speakerName)
> raises (SpeakerNotSupported);
> Voice getVoice (in string voiceName)
> raises (VoiceNotAvailable);
So these would return a new instance of that interface, which would
store that client's settings ?, and inter-instance interactions would be
dealt with in-proc in the sound server ?
> Voice getCurrentVoice ();
> Voice setCurrentVoice (in string voiceName)
> raises (VoiceNotAvailable); // returns the
> previously selected voice
Would these two make sense in that model ?
> void say (in string text);
> void shutUp();
> boolean isSpeaking();
I'm somewhat confused as to why we duplicate this functionality
on the
voice and the other bits.
As you can see, I'm also somewhat confused as to the role of the
voices, and the properties - are these just to be set by some control
panel type application ? or are they intended to be programatically
varied at high frequency [ per word ] ? in short, what are the
properties :-)
> As you see we have three objects: the SpeechManager, the Speaker,
> and the Voice.
>
> The SpeechManager is the main entry point for the client. The client
> will ask the SM for a list of available speakers (I'm not sure if we
> need the SpeakerCount, or we can get that from the SpeakerList).
You can get it from the list (sequence).
> getSpeaker will return a Speaker object if one is found with that
> name. I don't know if bonobo allows you that, but I have assumed so.
Yes, you can return CORBA_OBJECT_NIL if there is none by that name.
> The Speaker object is here, to allow us to query for the avaialble
> parameter ranges. The Speaker can not speak ;-))). In order to
> speak, you need to ask the Speaker to create a Voice with
> createVoice. The client provides a name for the Voice (Speakears
> have predefined names!). createVoice will return you a Voice object
> with some default parameter settings.
Interesting.
> What can we do with a Voice? We can change the supported parameters
> in the allowed ranges, and ask it to say something, shut up, etc. We
> also need to be able to receive notifications as speech markers (at
> least EOS). For that I provided the registerMarkerListener, but I
> have some doubts that this is the right way. Doesn't bonobo support
> some sort of standard event generation mechanism, like COM's
> connection points? If yes, we should use that instead of my
> home-brewed registerMarkerListener;
cf. BonoboListener, BonoboEventSource.
> So the client the client keeps track of the voice objects it
> created, and generates speech throug the Voice objects. Using the
> GNOME Speech like this it's very flexible, and it solves also a
> problem that might arise in the future, if we want to
> allow simultaneous speaking of multiple voices.
Great.
> I would like to introduce also the "current voice" approach, as it
> is more convenient for some clients. At the SpeechManager level I
> have provided a get/setCurrentVoice pair and doubled the speech
> functions from the Voice object (say, shutUp, isSpeaking, etc).
> Tipically the client will produce once, at the begining, the voices
> it wants, and then just switch betwen them before saying something.
> We expect SRS in Gnopernicus to use this approach. Given the Voices
> objects the impelemntation of the CurrentVoice concept is almost
> trivial.
The problem is muxing this between clients; it's not possible to tell
where an invocation came from, [ which client ]. If you add a client
interface handle to all the methods to 'say' it would be, but then you
might as well have the invocation on the voice.
Thus if client a) selects voice 'Male1' and client b) selects voice
'Female2' who wins ? - but then perhaps it's too difficult to design
this to work well with multiple clients.
> I have also introduced getVoices at SpeechManager level. The ideea
> of making the voices global at this level comes from Peter's wish to
> allow Gnopernicus to interoperate with other self-voicing apps. With
> this, if GS will support multiple clients as a singleton, it is
> possible for a Gnopernicus aware self-voicing app to speak with a
> Gnopernicus defined voice. In this respect, it might be useful if we
> would also provide notification to the clients when Voices come and
> go.
Hmm,
Hope the comments help, as you can see I'm somewhat unclear as to the
entire purpose of it all :-)
Regards,
Michael.
--
mmeeks gnu org <><, Pseudo Engineer, itinerant idiot
_______________________________________________
Gnome-accessibility-devel mailing list
Gnome-accessibility-devel gnome org
http://mail.gnome.org/mailman/listinfo/gnome-accessibility-devel
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]