[orca-list] Comparing TTS at high speed.



I've played a bit with the svox pico voice.  It gives a segfault when
trying to close it's memory package, probably because I'm on a 64-b
bit machine, but it actually creates valid .wav files before crashing.

This voice sounds exactly like the voice in Google Navigator on
Android phones.  This package comes with only a single tool:
pico2wave, but it understands a few embedded text commands, for speed
and pitch.  I found that when you speed it up, it does not scale the
pauses, which results in crazy long delays between sentences.  I had
to modify the source code to make the pauses right, but after that, I
was able to compare pico to voxin and espeak at high speed.

Voxin (Eloquence) is the clear winner.  I'd say Espeak is just a bit
better than pico at high speed, but I'm not sure, because I'm more use
to the espeak voice.  I recorded all three running at a pretty high
speed for espeak and pico, but not very fast for Voxin.  All are about
1 minute, 54 seconds in length, so it's a fair comparison.  You can
listen at:

http://billrocks.org/espeak.wav
http://billrocks.org/pico.wav
http://billrocks.org/voxin.wav

Now I don't want to bash pico.  It's a new engine, and it's some kind
of strange hybrid between stitching natural voice segments and formant
synthesisers like espeak and voxin.  Also, it's open-source, which is
awesome!  Clearly, it's had little to no tuning for high speed rates.
There isn't even a command to override it's insane 1/2 second pauses
between sentences.  I don't know why this voice has so much static.
It sounds like it was recorded with a bad mic.  It's really horrible.
However, I get a feeling that they are onto something, and if they
could just fix a couple things, the pico engine could do high speed
like voxin.  It already sounds much better at high speed than
"natural" voices based on stitching, which relies on frequency
shifting.

Now, just for fun, here's the same think played with voxin at my "top"
reading speed, which I reserve for terribly long-winded authors,
because I begin to lose comprehension at this speed.

http://billrocks.org/voxin_fast.wav

Sina, is it possible for you to upload a wave file for this example?
I've attached the text file.  I'd love to hear it at your top speed.

Thanks,
Bill

Attachment: test1.txt
Description: Text document



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]