Re: [g-a-devel]Gnome Speech Architecture Proposal - the IDL



Michael, Draghi and all:

I'm not sold on this "speaker-based" approach which Draghi proposes as of yet. as Michael's mail points out, it seems to confuse the purpose of each individual component of the system. Beyond making it easier for Gnopernicus to implement, what does this approach provide that the original engine-based API did not? Something tells me that if you want to create this idea of global voices, each having an associated engine and user-defined voice name, then this belongs in Gnopernicus not in gnome-speech. I fear that hiding control over the individual engines may come back to byte us in the future...

Using a speaker-based approach will also make installing individual engine drivers more difficult. How does the speech manager find the installed engines and speakers? If it simply wraps bonobo-activation to do this, and talks to a SynthesisDriver interface underneath, which is what I fear would be necessary, then we may as well eliminate layers and just expose SynthesisDriver as in the original proposal.

Marc

At 10:23 AM 5/20/2002 +0100, Michael Meeks wrote:
Hi Draghi,

On Thu, 2002-05-16 at 11:54, Draghi Puterity wrote:
> here is a very crude draft of an IDL for my recent proposal of the
> Gnome-Speech architecture.

        I'll give some general comments as I go, hopefully I'll catch most
things.

>         boolean      registerMarkerListener (in string userCookie,
> word flags, ??? callback);

Ok - it's not clear to me what you want this callback to do, but if you
want to have the interface send back data, you need to implement a
Bonobo::Listener interface, or (worse) your own custom interface on the
client to be able to receive the messages and return them to the caller.
I would reccommend aggregating a Bonobo::EventSource and a
Bonobo::Listener interface, since they use string names for events, and
do a fair bit for you - they are more generic, standard and
'understood'.

>         void         say (in string text);
>         void         shutUp();

        One thing worth knowing is that if you fire a void method off with no
exceptions, the app will block waiting for the result, you probably want
to add 'oneway' to any void method that doesn't return an exception -
then your app will fire and forget.

>         boolean      isSpeaking();

        Ok - what is this method for ? if you are expecting people to do:

        while (obj->isSpeaking ());
        obj->say ("Foo");

        You have built a race condition ;-) possibly you want to have:

        enum {
                SAY_OVERRIDE,
                SAY_OVERRIDE_CANCEL,
                SAY_IF_NOT_SPEAKING,
                SAY_FOO
        } HowToSay;

        boolean say (in string text, HowToSay howto)
                raises (WasSpeaking);

        Or somesuch - but perhaps that is too complicated for your needs.

>         ParameterList  getSupportedParameters ();

        I imaging a ParameterList is in fact a ParameterRangeList where:

        typedef sequence<ParameterRange> ParameterRangeList.

        The sequence will as you suggest encode the length.

>         any getParamaterValue (in string parameterName)
>             raises (ParameterNotSupported);
>
>         void setParameterValue (in string parameterName, any value)
>             raises (ParameterNotSupported, ParameterOutOfRange,
> WrongValueType);

        If people will do a lot of sets, it might be worth binning the
exceptions here, since they're all calculable from the
ParameterRangeList, and making that oneway, or simple adding a oneway
variant to avoid roundtrips.

>         ParameterList    getSupportedParameters ();
>         ParameterRange   getParameterRange (in string ParameterName)
>             raises (ParameterNotSupported);

        Surely a (far more static) RangeList would be more useful as
getSupportedParameters ? that is unless you badly need a 'bulk fetch'
interface as well. I suppose one thing that interests me is who will use
these interfaces. Perhaps an interface is used only for 1 client, and a
fresh intance of that interface returned to each new client ? if not,
you may get applications fighting over these (or any) properties.

>         Speaker             getSpeaker (in string speakerName)
>             raises (SpeakerNotSupported);
>         Voice               getVoice (in string voiceName)
>             raises (VoiceNotAvailable);

        So these would return a new instance of that interface, which would
store that client's settings ?, and inter-instance interactions would be
dealt with in-proc in the sound server ?

>         Voice               getCurrentVoice ();
>         Voice               setCurrentVoice (in string voiceName)
>              raises (VoiceNotAvailable);    // returns the
> previously selected voice

        Would these two make sense in that model ?

>         void                 say (in string text);
>         void                 shutUp();
>         boolean              isSpeaking();

I'm somewhat confused as to why we duplicate this functionality on the
voice and the other bits.

        As you can see, I'm also somewhat confused as to the role of the
voices, and the properties - are these just to be set by some control
panel type application ? or are they intended to be programatically
varied at high frequency [ per word ] ? in short, what are the
properties :-)

> As you see we have three objects: the SpeechManager, the Speaker,
> and the Voice.
>
> The SpeechManager is the main entry point for the client. The client
> will ask the SM for a list of available speakers (I'm not sure if we
> need the SpeakerCount, or we can get that  from the SpeakerList).

        You can get it from the list (sequence).

>  getSpeaker will return a Speaker object if one is found with that
> name. I don't know if bonobo allows you that, but I have assumed so.

        Yes, you can return CORBA_OBJECT_NIL if there is none by that name.

> The Speaker object is here, to allow us to query for the avaialble
> parameter ranges. The Speaker can not speak ;-))). In order to
> speak, you need to ask the Speaker to create a Voice with
> createVoice. The client provides a name for the Voice (Speakears
> have predefined names!). createVoice will return you a Voice object
> with some default parameter settings.

        Interesting.

> What can we do with a Voice? We can change the supported parameters
> in the allowed ranges, and ask it to say something, shut up, etc. We
> also need to be able to receive notifications as speech markers (at
> least EOS). For that I provided the registerMarkerListener, but I
> have some doubts that this is the right way. Doesn't bonobo support
> some sort of standard event generation mechanism, like COM's
> connection points? If yes, we should use that instead of my
> home-brewed registerMarkerListener;

        cf. BonoboListener, BonoboEventSource.

> So the client the client keeps track of the voice objects it
> created, and generates speech throug the Voice objects. Using the
> GNOME Speech like this it's very flexible, and it solves also a
> problem that might arise in the future, if we want to
> allow simultaneous speaking of multiple voices.

        Great.

> I would like to introduce also the "current voice" approach, as it
> is more convenient for some clients. At the SpeechManager level I
> have provided a get/setCurrentVoice pair and doubled the speech
> functions from the Voice object (say, shutUp, isSpeaking, etc).
> Tipically the client will produce once, at the begining, the voices
> it wants, and then just switch betwen them before saying something.
> We expect SRS in Gnopernicus to use this approach. Given the Voices
> objects the impelemntation of the CurrentVoice concept is almost
> trivial.

        The problem is muxing this between clients; it's not possible to tell
where an invocation came from, [ which client ]. If you add a client
interface handle to all the methods to 'say' it would be, but then you
might as well have the invocation on the voice.

        Thus if client a) selects voice 'Male1' and client b) selects voice
'Female2' who wins ? - but then perhaps it's too difficult to design
this to work well with multiple clients.

> I have also introduced getVoices at SpeechManager level. The ideea
> of making the voices global at this level comes from Peter's wish to
> allow Gnopernicus to interoperate with other self-voicing apps. With
> this, if GS will support multiple clients as a singleton, it is
> possible for a Gnopernicus aware self-voicing app to speak with a
> Gnopernicus defined voice. In this respect, it might be useful if we
> would also provide notification to the clients when Voices come and
> go.

        Hmm,

        Hope the comments help, as you can see I'm somewhat unclear as to the
entire purpose of it all :-)

        Regards,

                Michael.

--
 mmeeks gnu org  <><, Pseudo Engineer, itinerant idiot

_______________________________________________
Gnome-accessibility-devel mailing list
Gnome-accessibility-devel gnome org
http://mail.gnome.org/mailman/listinfo/gnome-accessibility-devel




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]