[g-a-devel]Re: New revision of GNOME Speech IDL

From: Willie Walker <William Walker sun com>
To: Marc Mulcahy <marc mulcahy sun com>
Cc: Paul Lamere <Paul Lamere sun com>, gnome-accessibility-devel gnome org, Bill Haneman sun com, Rich Burridge <Rich Burridge sun com>, peter korn sun com, Marney Beard <marney beard sun com>
Subject: [g-a-devel]Re: New revision of GNOME Speech IDL
Date: Tue, 29 Oct 2002 13:33:28 -0500
Hi All:

Thanks for the updates, Marc.  BTW, JSAPI is currently undergoing changes
through JSR-113.  Depending upon the timing of the GSAPI work, it might
be a good thing to hook up with this JSR.  Paul is our rep on the JSR-113
expert group.

Will

Marc Mulcahy wrote:
> 
> Hi Paul,
> 
> Going through mail and wasn't sure whether I had adequately
> responded/filled you in on what's happening with GNOME Speech, so either
> sorry this is so late or sorry for the repeat, not sure which.
> 
> We're preparing version 0.2 of GNOME Speech, based on the API you reviewed
> on the 9th of October, to ship with Gnopernicus, since it requires some
> sort of speech service.  I have one driver written for IBM's Linux Viavoice
> TTS kit, and am working on a FreeTTS driver for the 0.2 stuff now.  But
> that will be pretty much all we do on the 0.2 version.
> 
> Rich Burridge has been working on converting the JSAPI class files to GNOME
> IDL as an initial proposal for the GNOME Speech 1.0 API, based on
> JSAPI.  So, it seems that the best use of future effort/review will be on
> the 1.0 track, so that's where I'll be focusing once this work on the 0.2
> stuff for Gnopernicus is finished.
> 
> Please don't hesitate to send feedback/questions.
> 
> I've formed the gnome-speech-dev sun com alias, to which you are a member.
> 
> Regards,
> 
> Marc
> 
> At 08:04 AM 10/9/2002 -0400, Paul Lamere wrote:
> >Marc:
> >
> >Thanks for sending along the new IDL for gnome-speech. I've taken a look
> >at and have made some comments, which are attached.
> >
> >
> >Paul
> >
> >
> >
> >*** Comment (0) - Overall
> >     It would be nice to have the API behavior documented a bit better
> >     (e.g. what does the boolean return from 'stop' mean?)
> >
> >*** Comment (1) - GNOME_Speech_SynthesizerDriver:voice_language
> >
> >     Since voice_language is an enum:
> >
> >       enum voice_language {
> >         language_english,
> >         language_german
> >       };
> >
> >     Does this mean that a new version of gnome-speech needs to be
> >     generated to support a new language?    How does Gnome represent a
> >     locale? If Gnome doesn't have a standard way of representing a locale,
> >     then perhaps ISO country/language codes should be used.  This would
> >     allow for speech engines to support new languages without any changes
> >     to the gnome-speech API.
> >
> >*** Comment (2) - GNOME_Speech_SynthesizerDriver:voice_info
> >
> >     VoiceInfo should probably be expanded to include:
> >        * age - the age of the speaker.
> >        * style - the style of the voice. Allows for further
> >                  distinguishing voices. For instance:
> >                  "business", "casual" could be styles supported by a
> >                  particular engine.
> >
> >
> >*** Comment (3) - GNOME_Speech_SynthesizerDriver
> >
> >     Is it possible for a system to have more than one vendor TTS engine
> >     installed? If it is possible, it is not clear to me how an application
> >     would chose one vendor TTS engine over another with this API.
> >
> >     How does an application get an instance of a  SynthesisDriver?
> >
> >*** Comment (4) - GNOME_Speech_SynthesizerDriver
> >
> >     In a typical system, would there be a single running instance of a TTS
> >     engine that all speech applications would share, or would each
> >     application have its own instance?   How can speech applications
> >     coordinate their output with each other?   For instance, a book reader
> >     application may queue up an hours worth of text to be spoken.  If a
> >     speech-enabled calendar application needs to announce an appointment
> >     would it:
> >
> >        a) Interrupt the book reader
> >        b) Speak over the book reader
> >        c) Wait until the book was finished.
> >
> >     How is this coordination managed in the API?
> >
> >*** Comment (5) - GNOME_Speech_Speaker
> >
> >    - Speaker: Needs a 'pause' method
> >    - Speaker: Needs a 'resume' method
> >    - Speaker: Needs a 'getVoiceInfo' method.
> >    - Perhaps add a 'sayURL' that takes a URL pointing to marked-up
> >      text.
> >    - Perhaps add a 'sayPlainText' that speaks text with no markup.
> >
> >    - Does 'say' return immediately after queueing up text?  I presume
> >      that since there is a 'wait' call, that indeed it does. If an
> >      application has queued up ten strings to be spoken, does 'stop'
> >      stop the speaking of the first string in the queue or all strings
> >      in the queue. Likewise, does 'wait' wait for the queue to be empty, or
> >      for the current (head of the queue) text to be spoken. Generally
> >      speaking, if there is a TTS output queue, an application will
> >      need some methods to allow fine grained manipulation the queue.
> >
> >
> >*** Comment (6) Vocabulary management
> >    - How does an application add new words/pronunciations to the TTS
> >      dictionary?
> >
> >
> >*** Comment (7) callbacks
> >     It is unclear from the API when callbacks are generated. There are
> >     a number of interesting speech events that an application may want
> >     to receive callbacks for:
> >
> >        - utterance started
> >        - utterance ended
> >        - word started
> >        - word ended
> >        - phoneme started
> >        - marker reached
> >        - utterance cancelled
> >        - utterance paused
> >        - utterance resumed
> >
> >     I suggest adding an event type to the SpeechCallback notify that
> >     can be used to indicate the type of event
> >
> >     Some applications may want to see all of these events, while some
> >     won't want to see any.  Since some of the events can add some
> >     overhead to the speech engine, it may be useful to allow an
> >     application to register callbacks for specific event types.
> >
> >     The 'notify' method gets a 'text_id' as a long type. I presume
> >     this is the long value that is returned by 'say'.  I think this is
> >     a bit awkward for an application. If an application wants access
> >     to the text that is currently being output it will have to store
> >     all the text, associate the text with the id and remember to clean
> >     it all up afterwards. I would prefer to see the SpeechCallBack
> >     'notify' method get a struct that includes a reference to the
> >     current string being output.
> >
> >*** Comment (8) Parameters -
> >     How does setting parameters interact with text that is queued to
> >     be spoken?  If an application queues ten strings to be spoken and
> >     then calls 'setParameterValue' to change a property such as the
> >     speaking volume, when does the change take place? Some options:
> >         a) Immediately (in the middle of speaking the current text)
> >         b) After the current utterance is finished
> >         c) After all the items currently in the queue have been
> >         spoken.
> >
> >     Note that some engines will not be able to change some parameters
> >     in mid-utterance.
> >
> >*** Comment (9) Parameters -
> >         What is the intended use of the 'enumerated' field of the
> >         Parameter struct?
> >
> >         What does 'getParameterValue' return if a request is made for
> >         an unknown parameter?
> >
> >         What does 'getParameterValueDescription' do?
> >
> >         Does the 'setParameterValue' return value indicate the
> >         success/failure of the set?
> >
> >         I'm not sure if the 'double' type is appropriate for all
> >         parameters.
> >
> >
> >*** Comment (10) Global Parameters -
> >
> >     If I have a number of speech applications do I need to set
> >     properties for each application, or is there a way that I can
> >     set parameters globally and have each application use these
> >     global settings.  For instance, I can imagine that a user may
> >     want to globally increase the speaking rate for all
> >     applications.  Perhaps a mechanism akin to the X resources
> >     would be appropriate.
> >
> >*** Comment (11) Audio Redirection -
> >    Is there anyway for an application to redirect audio?
> >
> >*** Comment (12) releaseSpeaker? -
> >
> >     When an application is done with a 'Speaker' that was created with
> >     SynthesisDriver::createSpeaker, how does the application release it?
References:
- [g-a-devel]New revision of GNOME Speech IDL
  - From: Marc Mulcahy
- [g-a-devel]Re: New revision of GNOME Speech IDL
  - From: Marc Mulcahy
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]