Re: [orca-list] Should Orca have a separate "reading" voice? (was: audio demo of Neospeech and Svox-Pico with Orca/speakup)
- From: Michael Whapples <mwhapples aim com>
- To: Bill Cox <waywardgeek gmail com>
- Cc: orca-list gnome org
- Subject: Re: [orca-list] Should Orca have a separate "reading" voice? (was: audio demo of Neospeech and Svox-Pico with Orca/speakup)
- Date: Sun, 16 May 2010 16:41:52 +0100
Hello,
I installed the debian package and the only command I can find to use
this synth is pico2wave which looks like it may only be able to output
to a file (I don't know if you could get it to write to stdout and so
pipe it onto a player). Also pico2wave doesn't seem to have any way to
alter the rate, etc other than possibly embedded control sequences in
the text (unsure whether embedded control sequences would work and I am
unsure what they actually are).
A basic example of the pico2wave command using the alice example from
the flite package would be:
pico2wave -w ~/svox_output.wav "$(zcat
/usr/share/doc/flite/examples/alice.gz)"
Output will be in ~/svox_output.wav and the zcat stuff is to do with
pico2wave taking the text to speak on the command line (we need to
extract the content from the file and put it on the command line).
See pico2wave --help for all options.
For someone who knows C, the header file looks fairly well commented and
I think by using the API you could gain extra control than that
available in pico2wave. An opentts module would be great if anyone has
time for it.
Michael Whapples
On 01/-10/-28163 08:59 PM, Bill Cox wrote:
I see that there's a new svox package at Debian. I've compiled and
installed it, but I don't know how to run it. Can you give me any
pointers?
Thanks,
Bill
On Sun, May 16, 2010 at 2:29 AM, Jason White<jason jasonjgw net> wrote:
Joanmarie Diggs<joanmarie diggs gmail com> wrote:
Willem, thanks for doing this! Ignoring the issues you found, the
Neospeech voice does sound awfully nice.
They're probably using concatenation synthesis, i.e., techniques which combine
pre-recorded speech segments such as diphones, triphones, etc., then "smooth"
the result. This approach requires large databases, which is why such
synthesizers tend to consume substantial memory, disk space, or both.
In contrast, SVOX Pico uses a relatively new technology in speech synthesis,
based on hidden Markov models, where the voice is entirely synthetic rather
than pre-recorded, but the synthesizer parameters are obtained from a
statistical model trained on real speech data. Machine learning techniques are
used in the textual analysis phase as well. As a result, it is very small in
regard to memory use, hence suitable for embedded devices such as mobile
phones, as well as installation media and almost any other context in which
one would want a small, efficient synthesizer with quality output. There are
limitations, of course, and it has its share of problems, but technologically
it's the result of very serious research by specialists in signal processing
and computational linguistics, and a significant contribution to free and
open-source software.
It would be nice if someone would fix the segfault under x86-64 that manifests
itself in the internal memory allocator, though.
Disclaimer: I am absolutely not qualified to discuss speech synthesis in
depth, an undergraduate course in phonetics notwithstanding.
This raises a question in my mind: If we're potentially going to have
access to speech synthesizers which are more human-sounding but perhaps
less performant, should Orca have a separate reading voice or SayAll
voice or some such thing? In other words, when you're typing, navigating
in menus and dialogs, etc., Orca would use one voice. When you're
reading text (and/or doing a SayAll), Orca would use another voice.
There may be synthesizers that require this. SVOX Pico shouldn't be one of
them, since it is designed to remain highly responsive in embedded
environments where CPU and memory resources are highly constrained.
Someone ought to write an OpenTTS module for it.
I don't know what the performance characteristics of concatenation
synthesizers are; they certainly take up more memory than ESpeak, SVOX Pico
and other small synthesizers. Bell Labs' TTS contains hundreds of megabytes of
speech data just for English, for example, which is one of the reasons for its
high quality speech - there are many pre-recorded segments for the selection
algorithm to choose from.
Aside from performance, users may however want different synthesizers under
different circumstances, so I'm not suggesting this is a bad idea at all.
_______________________________________________
orca-list mailing list
orca-list gnome org
http://mail.gnome.org/mailman/listinfo/orca-list
Visit http://live.gnome.org/Orca for more information on Orca.
The manual is at http://library.gnome.org/users/gnome-access-guide/nightly/ats-2.html
The FAQ is at http://live.gnome.org/Orca/FrequentlyAskedQuestions
Netiquette Guidelines are at http://live.gnome.org/Orca/FrequentlyAskedQuestions/NetiquetteGuidelines
Log bugs and feature requests at http://bugzilla.gnome.org
Find out how to help at http://live.gnome.org/Orca/HowCanIHelp
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]