Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- From: Willie Walker <William Walker Sun COM>
- To: orca-list gnome org
- Subject: Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Date: Thu, 10 Apr 2008 10:27:39 -0400
Thanks Milan.
So...I have a dilemma. The imperfect gnome-speech-based solution in
Orca exists and generally works. The emergence of PulseAudio is also
helping to address one of the major issues (audio device contention).
While not perfect, the current solution provides emulation for missing
TTS features in a very expedient and controllable means to give users
what they want *today*. We can also quickly make adjustments to
gnome-speech to provide support for features enabled by the speech
engine (e.g., verbalized punctuation, capitalization, etc.) and we can
quickly adjust Orca to pass things on to the speech engine rather than
emulate them at the Orca layer. In addition, all of this is
encapsulated in GNOME, making it easy to manage from the release and
packaging standpoints.
What I'm getting from Brailcom is a proposed solution that, when
implemented, seems like it could address a number of problems. It will
eliminate the need for Orca to do emulation of missing features. It
will provide features that are on the Orca requirements list, but which
are not currently implemented (e.g., verbalized capitalization, audio
icons, etc.). It will also act as a system service that many apps can
use, which will run on a large number of platforms, and which does not
require a desktop to be running.
That's great. As a result of this promise, I permitted the speech
dispatcher code into Orca as a means to provide a proving ground. It is
still interesting to me, but it does not come without issue: it is
incomplete, it is not an accepted dependency for GNOME, dogmatic pursuit
of purism, etc.
What I didn't expect was inflexible opposition from Brailcom to the
practical solutions provided by Orca, such as the notion of the user
specifying pronunciation definitions at a higher level. Until the
unsophisticated user has a convenient mechanism for doing things such as
tweaking pronunciations, Orca is going to provide a means to do this.
Until verbalized punctuation is guaranteed to be supported by the lower
layers, Orca is going to provide a means to emulate this. As the Orca
project lead, this is my decision, and it is based upon user
requirements. I hear Brailcom loud and clear - you don't like this.
Please, let's agree to disagree and let's focus on SpeechDispatcher.
Until it is complete, stable, and we're sure it helps us meet the user
requirements, I cannot make SpeechDispatcher a supported part of Orca.
We have at least gotten to the point where we've identified the API that
will be exposed to Orca, which is the speechd Python bindings. With the
exception of some things, it sees like a viable API, though I need to
dig into it a little deeper.
Assuming the API is workable as is, do you have an estimate for the
amount of work (cost and timeframe) needed to complete the
implementation and provide complete support for at least eSpeak,
Festival, Cepstral, DECtalk, and IBMTTS? What is your support model and
release schedule going to be once the implementation is done? What is
your community model going to be (e.g., can others outside Brailcom
contribute patches/enhancements to SpeechDispatcher)?
Will
Milan Zamazal wrote:
"WW" == Willie Walker <William Walker Sun COM> writes:
WW> One of the questions I have right now is the ability for a
WW> client to programmatically configure various things in
WW> SpeechDispatcher, such as pronunciations for words. In looking
WW> at the existing API, I'm not sure I see a way to do this. Nor
WW> am I sure if this is something that a speech dispatcher user
WW> needs to do on an engine-by-engine basis or if there is a
WW> pronunciation dictionary that speech dispatcher provides for all
WW> output modules to use.
SSIP supports SSML, so in theory it is possible to pass pronunciation
etc. using its means. In practice SSML is probably only little
supported, if at all, in most TTS systems so it wouldn't work. But it's
not a fault of Speech Dispatcher, it's just missing feature of something
-- preferably it should be present in TTS systems, or at least in some
frontend to them. I think new speech dispatcher TTS driver library
should provide means for parsing SSML and the drivers should handle it
some way if the corresponding TTS system can't.
Just one remark to pronunciation as a typical representative of some
problems: It is important to distinguish between special pronunciation
and regular pronunciation. In the first case, e.g. when some word
should be pronounced in a non-regular way for some reason, it's
completely valid to pass pronunciation information from the client to
the engine. But in the latter case, e.g. when some engine mispronounces
some words, the client should no way attempt to "fix" it, this would
only make the situation worse. The proper solution is to fix
pronunciation in the engine. TTS drivers may attempt to work around it
when fixing the engine is not possible, but in this particular case it
should be considered as an extreme approach, to be applied only when
really nothing else works. As for common pronunciation dictionaries I
doubt it can be done on a common level because different synthesizers
use different phoneme sets and their representation.
On the other hand it seems reasonable to handle some other features such
as signalling capitalization, punctuation, sound icons, etc. on a common
basis in the TTS drivers. But beware, this may require language
dependent text analysis and may interfere with TTS processing of some
synthesizers. So it shouldn't be applied universally and each of the
TTS drivers must have free choice how to handle such things -- whether
to let it on the synthesizer or whether (when the synthesizer is unable
to handle the requirements) to use TTS driver means. When one thinks
about it more it becomes clear that it would be very useful to have just
a single common text analysis frontend to free speech synthesizers and
to make different synthesizers start their own work only after the
phonetic transcription of the input is available. But this is another
issue.
[...]
WW> I wasn't sure how to interpret "No", but my interpretation was
WW> that emulation was NOT done, and this seems to match my
WW> interpretation of "Right" above. But, maybe "No" meant
WW> something like "No, speech dispatcher itself doesn't do
WW> emulation, but that can be done at a lower layer in the speech
WW> dispatcher internals." If that's the case, from the client's
WW> point of view, it's still speech dispatcher, and the client can
WW> now depend upon speech dispatcher to do the emulation.
Yes, I think there is some terminology confusion here. The new Speech
Dispatcher contains TTS API and drivers as its part, while the current
implementation is focused basically just on message dispatching. I'd
suggest to name the parts explicitly in discussion (dispatching,
interface, output modules, TTS API, TTS drivers, configuration) to avoid
confusion.
In my opinion it's basically as you write above. Clients nor any of the
Speech Dispatcher parts with the exception of TTS drivers should care
about their emulation of missing TTS features. They should perform
their own jobs and should rely on TTS systems and their TTS drivers to
ensure proper speech output. Presence of common TTS API should
guarantee the emulation work will be done only once in a single place
behind the TTS API, i.e. in speech synthesizers (preferably) or in the
TTS drivers (when the TTS system way is not possible). Possible
creation of common TTS processing frontend to speech synthesizers
mentioned above comes to play here, but considering current state of
things it would be premature to get distracted by this idea too much.
WW> Let me try to rephrase this question: from Orca's point of view,
WW> if text is handed off to speech dispatcher via speechd, will we
^^^^^^^
SSIP?
WW> be guaranteed that the appropriate emulation will be provided
WW> for features that are not supported by a speech engine? For
WW> example, if an audio cue is desired for capital letters, will
WW> the Orca user be guaranteed that something in Speech Dispatcher
WW> will play an audio icon for capitalization if the engine doesn't
WW> support this directly? Or, if verbalized punctuation is not
WW> supported by the engine, will the Orca user be guaranteed that
WW> something in Speech Dispatcher will emulate the support if the
WW> engine does not support this directly?
My simple answer is Yes (the detailed answer is above).
I'm not sure I'd agree with everyone here on particular details, but I
hope the basic ideas and explanations outlined above might be acceptable
to all members of the Speech Dispatcher team, as well as to Orca and
other client development teams.
Thanks for your questions helping to clarify things!
Regards,
Milan Zamazal
_______________________________________________
Orca-list mailing list
Orca-list gnome org
http://mail.gnome.org/mailman/listinfo/orca-list
Visit http://live.gnome.org/Orca for more information on Orca
- Follow-Ups:
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- References:
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]