[g-a-devel]Re: New revision of GNOME Speech IDL
- From: Willie Walker <William Walker sun com>
- To: Marc Mulcahy <marc mulcahy sun com>
- Cc: Paul Lamere <Paul Lamere sun com>, gnome-accessibility-devel gnome org, Bill Haneman sun com, Rich Burridge <Rich Burridge sun com>, peter korn sun com, Marney Beard <marney beard sun com>
- Subject: [g-a-devel]Re: New revision of GNOME Speech IDL
- Date: Tue, 29 Oct 2002 13:33:28 -0500
Hi All:
Thanks for the updates, Marc. BTW, JSAPI is currently undergoing changes
through JSR-113. Depending upon the timing of the GSAPI work, it might
be a good thing to hook up with this JSR. Paul is our rep on the JSR-113
expert group.
Will
Marc Mulcahy wrote:
>
> Hi Paul,
>
> Going through mail and wasn't sure whether I had adequately
> responded/filled you in on what's happening with GNOME Speech, so either
> sorry this is so late or sorry for the repeat, not sure which.
>
> We're preparing version 0.2 of GNOME Speech, based on the API you reviewed
> on the 9th of October, to ship with Gnopernicus, since it requires some
> sort of speech service. I have one driver written for IBM's Linux Viavoice
> TTS kit, and am working on a FreeTTS driver for the 0.2 stuff now. But
> that will be pretty much all we do on the 0.2 version.
>
> Rich Burridge has been working on converting the JSAPI class files to GNOME
> IDL as an initial proposal for the GNOME Speech 1.0 API, based on
> JSAPI. So, it seems that the best use of future effort/review will be on
> the 1.0 track, so that's where I'll be focusing once this work on the 0.2
> stuff for Gnopernicus is finished.
>
> Please don't hesitate to send feedback/questions.
>
> I've formed the gnome-speech-dev sun com alias, to which you are a member.
>
> Regards,
>
> Marc
>
> At 08:04 AM 10/9/2002 -0400, Paul Lamere wrote:
> >Marc:
> >
> >Thanks for sending along the new IDL for gnome-speech. I've taken a look
> >at and have made some comments, which are attached.
> >
> >
> >Paul
> >
> >
> >
> >*** Comment (0) - Overall
> > It would be nice to have the API behavior documented a bit better
> > (e.g. what does the boolean return from 'stop' mean?)
> >
> >*** Comment (1) - GNOME_Speech_SynthesizerDriver:voice_language
> >
> > Since voice_language is an enum:
> >
> > enum voice_language {
> > language_english,
> > language_german
> > };
> >
> > Does this mean that a new version of gnome-speech needs to be
> > generated to support a new language? How does Gnome represent a
> > locale? If Gnome doesn't have a standard way of representing a locale,
> > then perhaps ISO country/language codes should be used. This would
> > allow for speech engines to support new languages without any changes
> > to the gnome-speech API.
> >
> >*** Comment (2) - GNOME_Speech_SynthesizerDriver:voice_info
> >
> > VoiceInfo should probably be expanded to include:
> > * age - the age of the speaker.
> > * style - the style of the voice. Allows for further
> > distinguishing voices. For instance:
> > "business", "casual" could be styles supported by a
> > particular engine.
> >
> >
> >*** Comment (3) - GNOME_Speech_SynthesizerDriver
> >
> > Is it possible for a system to have more than one vendor TTS engine
> > installed? If it is possible, it is not clear to me how an application
> > would chose one vendor TTS engine over another with this API.
> >
> > How does an application get an instance of a SynthesisDriver?
> >
> >*** Comment (4) - GNOME_Speech_SynthesizerDriver
> >
> > In a typical system, would there be a single running instance of a TTS
> > engine that all speech applications would share, or would each
> > application have its own instance? How can speech applications
> > coordinate their output with each other? For instance, a book reader
> > application may queue up an hours worth of text to be spoken. If a
> > speech-enabled calendar application needs to announce an appointment
> > would it:
> >
> > a) Interrupt the book reader
> > b) Speak over the book reader
> > c) Wait until the book was finished.
> >
> > How is this coordination managed in the API?
> >
> >*** Comment (5) - GNOME_Speech_Speaker
> >
> > - Speaker: Needs a 'pause' method
> > - Speaker: Needs a 'resume' method
> > - Speaker: Needs a 'getVoiceInfo' method.
> > - Perhaps add a 'sayURL' that takes a URL pointing to marked-up
> > text.
> > - Perhaps add a 'sayPlainText' that speaks text with no markup.
> >
> > - Does 'say' return immediately after queueing up text? I presume
> > that since there is a 'wait' call, that indeed it does. If an
> > application has queued up ten strings to be spoken, does 'stop'
> > stop the speaking of the first string in the queue or all strings
> > in the queue. Likewise, does 'wait' wait for the queue to be empty, or
> > for the current (head of the queue) text to be spoken. Generally
> > speaking, if there is a TTS output queue, an application will
> > need some methods to allow fine grained manipulation the queue.
> >
> >
> >*** Comment (6) Vocabulary management
> > - How does an application add new words/pronunciations to the TTS
> > dictionary?
> >
> >
> >*** Comment (7) callbacks
> > It is unclear from the API when callbacks are generated. There are
> > a number of interesting speech events that an application may want
> > to receive callbacks for:
> >
> > - utterance started
> > - utterance ended
> > - word started
> > - word ended
> > - phoneme started
> > - marker reached
> > - utterance cancelled
> > - utterance paused
> > - utterance resumed
> >
> > I suggest adding an event type to the SpeechCallback notify that
> > can be used to indicate the type of event
> >
> > Some applications may want to see all of these events, while some
> > won't want to see any. Since some of the events can add some
> > overhead to the speech engine, it may be useful to allow an
> > application to register callbacks for specific event types.
> >
> > The 'notify' method gets a 'text_id' as a long type. I presume
> > this is the long value that is returned by 'say'. I think this is
> > a bit awkward for an application. If an application wants access
> > to the text that is currently being output it will have to store
> > all the text, associate the text with the id and remember to clean
> > it all up afterwards. I would prefer to see the SpeechCallBack
> > 'notify' method get a struct that includes a reference to the
> > current string being output.
> >
> >*** Comment (8) Parameters -
> > How does setting parameters interact with text that is queued to
> > be spoken? If an application queues ten strings to be spoken and
> > then calls 'setParameterValue' to change a property such as the
> > speaking volume, when does the change take place? Some options:
> > a) Immediately (in the middle of speaking the current text)
> > b) After the current utterance is finished
> > c) After all the items currently in the queue have been
> > spoken.
> >
> > Note that some engines will not be able to change some parameters
> > in mid-utterance.
> >
> >*** Comment (9) Parameters -
> > What is the intended use of the 'enumerated' field of the
> > Parameter struct?
> >
> > What does 'getParameterValue' return if a request is made for
> > an unknown parameter?
> >
> > What does 'getParameterValueDescription' do?
> >
> > Does the 'setParameterValue' return value indicate the
> > success/failure of the set?
> >
> > I'm not sure if the 'double' type is appropriate for all
> > parameters.
> >
> >
> >*** Comment (10) Global Parameters -
> >
> > If I have a number of speech applications do I need to set
> > properties for each application, or is there a way that I can
> > set parameters globally and have each application use these
> > global settings. For instance, I can imagine that a user may
> > want to globally increase the speaking rate for all
> > applications. Perhaps a mechanism akin to the X resources
> > would be appropriate.
> >
> >*** Comment (11) Audio Redirection -
> > Is there anyway for an application to redirect audio?
> >
> >*** Comment (12) releaseSpeaker? -
> >
> > When an application is done with a 'Speaker' that was created with
> > SynthesisDriver::createSpeaker, how does the application release it?
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]