Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca



"WW" == Willie Walker <William Walker Sun COM> writes:

    WW> One of the questions I have right now is the ability for a
    WW> client to programmatically configure various things in
    WW> SpeechDispatcher, such as pronunciations for words.  In looking
    WW> at the existing API, I'm not sure I see a way to do this.  Nor
    WW> am I sure if this is something that a speech dispatcher user
    WW> needs to do on an engine-by-engine basis or if there is a
    WW> pronunciation dictionary that speech dispatcher provides for all
    WW> output modules to use.

SSIP supports SSML, so in theory it is possible to pass pronunciation
etc. using its means.  In practice SSML is probably only little
supported, if at all, in most TTS systems so it wouldn't work.  But it's
not a fault of Speech Dispatcher, it's just missing feature of something
-- preferably it should be present in TTS systems, or at least in some
frontend to them.  I think new speech dispatcher TTS driver library
should provide means for parsing SSML and the drivers should handle it
some way if the corresponding TTS system can't.

Just one remark to pronunciation as a typical representative of some
problems: It is important to distinguish between special pronunciation
and regular pronunciation.  In the first case, e.g. when some word
should be pronounced in a non-regular way for some reason, it's
completely valid to pass pronunciation information from the client to
the engine.  But in the latter case, e.g. when some engine mispronounces
some words, the client should no way attempt to "fix" it, this would
only make the situation worse.  The proper solution is to fix
pronunciation in the engine.  TTS drivers may attempt to work around it
when fixing the engine is not possible, but in this particular case it
should be considered as an extreme approach, to be applied only when
really nothing else works.  As for common pronunciation dictionaries I
doubt it can be done on a common level because different synthesizers
use different phoneme sets and their representation.

On the other hand it seems reasonable to handle some other features such
as signalling capitalization, punctuation, sound icons, etc. on a common
basis in the TTS drivers.  But beware, this may require language
dependent text analysis and may interfere with TTS processing of some
synthesizers.  So it shouldn't be applied universally and each of the
TTS drivers must have free choice how to handle such things -- whether
to let it on the synthesizer or whether (when the synthesizer is unable
to handle the requirements) to use TTS driver means.  When one thinks
about it more it becomes clear that it would be very useful to have just
a single common text analysis frontend to free speech synthesizers and
to make different synthesizers start their own work only after the
phonetic transcription of the input is available.  But this is another
issue.

[...]

    WW> I wasn't sure how to interpret "No", but my interpretation was
    WW> that emulation was NOT done, and this seems to match my
    WW> interpretation of "Right" above.  But, maybe "No" meant
    WW> something like "No, speech dispatcher itself doesn't do
    WW> emulation, but that can be done at a lower layer in the speech
    WW> dispatcher internals."  If that's the case, from the client's
    WW> point of view, it's still speech dispatcher, and the client can
    WW> now depend upon speech dispatcher to do the emulation.

Yes, I think there is some terminology confusion here.  The new Speech
Dispatcher contains TTS API and drivers as its part, while the current
implementation is focused basically just on message dispatching.  I'd
suggest to name the parts explicitly in discussion (dispatching,
interface, output modules, TTS API, TTS drivers, configuration) to avoid
confusion.

In my opinion it's basically as you write above.  Clients nor any of the
Speech Dispatcher parts with the exception of TTS drivers should care
about their emulation of missing TTS features.  They should perform
their own jobs and should rely on TTS systems and their TTS drivers to
ensure proper speech output.  Presence of common TTS API should
guarantee the emulation work will be done only once in a single place
behind the TTS API, i.e. in speech synthesizers (preferably) or in the
TTS drivers (when the TTS system way is not possible).  Possible
creation of common TTS processing frontend to speech synthesizers
mentioned above comes to play here, but considering current state of
things it would be premature to get distracted by this idea too much.

    WW> Let me try to rephrase this question: from Orca's point of view,
    WW> if text is handed off to speech dispatcher via speechd, will we
                                                       ^^^^^^^
                                                       SSIP?
    WW> be guaranteed that the appropriate emulation will be provided
    WW> for features that are not supported by a speech engine?  For
    WW> example, if an audio cue is desired for capital letters, will
    WW> the Orca user be guaranteed that something in Speech Dispatcher
    WW> will play an audio icon for capitalization if the engine doesn't
    WW> support this directly?  Or, if verbalized punctuation is not
    WW> supported by the engine, will the Orca user be guaranteed that
    WW> something in Speech Dispatcher will emulate the support if the
    WW> engine does not support this directly?

My simple answer is Yes (the detailed answer is above).

I'm not sure I'd agree with everyone here on particular details, but I
hope the basic ideas and explanations outlined above might be acceptable
to all members of the Speech Dispatcher team, as well as to Orca and
other client development teams.

Thanks for your questions helping to clarify things!

Regards,

Milan Zamazal



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]