[g-a-devel]Re: New revision of GNOME Speech IDL
- From: Marc Mulcahy <marc mulcahy sun com>
- To: Paul Lamere <Paul Lamere sun com>
- Cc: gnome-accessibility-devel gnome org, Bill Haneman sun com, Rich Burridge <Rich Burridge sun com>, william walker sun com, peter korn sun com, Marney Beard <marney beard sun com>
- Subject: [g-a-devel]Re: New revision of GNOME Speech IDL
- Date: Tue, 29 Oct 2002 11:29:15 -0700
Hi Paul,
Going through mail and wasn't sure whether I had adequately
responded/filled you in on what's happening with GNOME Speech, so either
sorry this is so late or sorry for the repeat, not sure which.
We're preparing version 0.2 of GNOME Speech, based on the API you reviewed
on the 9th of October, to ship with Gnopernicus, since it requires some
sort of speech service. I have one driver written for IBM's Linux Viavoice
TTS kit, and am working on a FreeTTS driver for the 0.2 stuff now. But
that will be pretty much all we do on the 0.2 version.
Rich Burridge has been working on converting the JSAPI class files to GNOME
IDL as an initial proposal for the GNOME Speech 1.0 API, based on
JSAPI. So, it seems that the best use of future effort/review will be on
the 1.0 track, so that's where I'll be focusing once this work on the 0.2
stuff for Gnopernicus is finished.
Please don't hesitate to send feedback/questions.
I've formed the gnome-speech-dev sun com alias, to which you are a member.
Regards,
Marc
At 08:04 AM 10/9/2002 -0400, Paul Lamere wrote:
Marc:
Thanks for sending along the new IDL for gnome-speech. I've taken a look
at and have made some comments, which are attached.
Paul
*** Comment (0) - Overall
It would be nice to have the API behavior documented a bit better
(e.g. what does the boolean return from 'stop' mean?)
*** Comment (1) - GNOME_Speech_SynthesizerDriver:voice_language
Since voice_language is an enum:
enum voice_language {
language_english,
language_german
};
Does this mean that a new version of gnome-speech needs to be
generated to support a new language? How does Gnome represent a
locale? If Gnome doesn't have a standard way of representing a locale,
then perhaps ISO country/language codes should be used. This would
allow for speech engines to support new languages without any changes
to the gnome-speech API.
*** Comment (2) - GNOME_Speech_SynthesizerDriver:voice_info
VoiceInfo should probably be expanded to include:
* age - the age of the speaker.
* style - the style of the voice. Allows for further
distinguishing voices. For instance:
"business", "casual" could be styles supported by a
particular engine.
*** Comment (3) - GNOME_Speech_SynthesizerDriver
Is it possible for a system to have more than one vendor TTS engine
installed? If it is possible, it is not clear to me how an application
would chose one vendor TTS engine over another with this API.
How does an application get an instance of a SynthesisDriver?
*** Comment (4) - GNOME_Speech_SynthesizerDriver
In a typical system, would there be a single running instance of a TTS
engine that all speech applications would share, or would each
application have its own instance? How can speech applications
coordinate their output with each other? For instance, a book reader
application may queue up an hours worth of text to be spoken. If a
speech-enabled calendar application needs to announce an appointment
would it:
a) Interrupt the book reader
b) Speak over the book reader
c) Wait until the book was finished.
How is this coordination managed in the API?
*** Comment (5) - GNOME_Speech_Speaker
- Speaker: Needs a 'pause' method
- Speaker: Needs a 'resume' method
- Speaker: Needs a 'getVoiceInfo' method.
- Perhaps add a 'sayURL' that takes a URL pointing to marked-up
text.
- Perhaps add a 'sayPlainText' that speaks text with no markup.
- Does 'say' return immediately after queueing up text? I presume
that since there is a 'wait' call, that indeed it does. If an
application has queued up ten strings to be spoken, does 'stop'
stop the speaking of the first string in the queue or all strings
in the queue. Likewise, does 'wait' wait for the queue to be empty, or
for the current (head of the queue) text to be spoken. Generally
speaking, if there is a TTS output queue, an application will
need some methods to allow fine grained manipulation the queue.
*** Comment (6) Vocabulary management
- How does an application add new words/pronunciations to the TTS
dictionary?
*** Comment (7) callbacks
It is unclear from the API when callbacks are generated. There are
a number of interesting speech events that an application may want
to receive callbacks for:
- utterance started
- utterance ended
- word started
- word ended
- phoneme started
- marker reached
- utterance cancelled
- utterance paused
- utterance resumed
I suggest adding an event type to the SpeechCallback notify that
can be used to indicate the type of event
Some applications may want to see all of these events, while some
won't want to see any. Since some of the events can add some
overhead to the speech engine, it may be useful to allow an
application to register callbacks for specific event types.
The 'notify' method gets a 'text_id' as a long type. I presume
this is the long value that is returned by 'say'. I think this is
a bit awkward for an application. If an application wants access
to the text that is currently being output it will have to store
all the text, associate the text with the id and remember to clean
it all up afterwards. I would prefer to see the SpeechCallBack
'notify' method get a struct that includes a reference to the
current string being output.
*** Comment (8) Parameters -
How does setting parameters interact with text that is queued to
be spoken? If an application queues ten strings to be spoken and
then calls 'setParameterValue' to change a property such as the
speaking volume, when does the change take place? Some options:
a) Immediately (in the middle of speaking the current text)
b) After the current utterance is finished
c) After all the items currently in the queue have been
spoken.
Note that some engines will not be able to change some parameters
in mid-utterance.
*** Comment (9) Parameters -
What is the intended use of the 'enumerated' field of the
Parameter struct?
What does 'getParameterValue' return if a request is made for
an unknown parameter?
What does 'getParameterValueDescription' do?
Does the 'setParameterValue' return value indicate the
success/failure of the set?
I'm not sure if the 'double' type is appropriate for all
parameters.
*** Comment (10) Global Parameters -
If I have a number of speech applications do I need to set
properties for each application, or is there a way that I can
set parameters globally and have each application use these
global settings. For instance, I can imagine that a user may
want to globally increase the speaking rate for all
applications. Perhaps a mechanism akin to the X resources
would be appropriate.
*** Comment (11) Audio Redirection -
Is there anyway for an application to redirect audio?
*** Comment (12) releaseSpeaker? -
When an application is done with a 'Speaker' that was created with
SynthesisDriver::createSpeaker, how does the application release it?
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]