Re: Festival vs. F-lite disk space requirements


If one looks at the Venn diagram of Festival/Flite/FreeTTS, the common logic is quite similar, but Festival's circle is much much larger than the other two. The common logic from this Venn diagram basically consists of a process of preprocessing input text, selecting units to play, combining them, and then playing them. Unfortunately, the data files are typically the largest hunk of data, and each engine uses its own format (Festival == Binary/ASCII EST files, Flite == C code, FreeTTS == Binary/ASCII of its own ilk). In retrospect, if we had understood the EST file format and Festival signal processing code better, FreeTTS probably could have used Festival data directly instead of duplicating the RELP-based approach in Flite.

Festival has a core set of logic (Scheme interpreter, scheme files, signal processing code, etc.) to deal with voice data, and one can download voice data separately for it. I think the bulk of the data consists mostly of pronunciation lexicons and usually some processed form of the actual voice recordings (i.e., the "group" file). To get a small set for perhaps en_US only, one could take a look at using the kal_diphone voice data and attempt to discover only the bare stuff needed to make it work. You'd probably also want to keep the MBROLA support in there as it tends to be small and can interface with MBROLA (a separate download) to get you better sounding voices. In any case, I *think* you're looking at about 22Meg total for a minimal kal_diphone-based en_US festival install, though I think you might be able to get 4Meg or so smaller if it's possible to prune some what-I-think-might-be-redundant lexicon data. These numbers are based upon my install of festival on my FC4 machine, and may actually be extra large because I've been goofing with the ARCTIC and HTS voice support. As an aside, there's some odd interaction between the festival server and gnome-speech on Ubuntu, which causes the CPU to throttle to 100%. I took a quick poke at this at one time, and it looks like there's something bogus happening on the socket/pipe communication between the two. My current commitments and pressures haven't afforded me the time to really dig into and solve the problem, though. :-(

Flite (a C-based engine) is based on data imported from Festival. Last I knew, its voice data files are compiled in as source code and you get what you get. I'm not sure there is opportunity for pruning, though you could get rid of the unit selection voice that's good for only speaking the time. still tends to be what it is: a small, fast, runtime engine. :-) It's been a long time since I looked at the code, so I don't remember sizing information, but I think it is the smallest of the bunch. There's also no direct gnome- speech support for it, other than indirectly through the recently added speech-dispatcher driver for gnome-speech (thanks Hynek!). Given resources, one probably could write a gnome-speech driver for flite and bypass this indirection.

FreeTTS (a Java-based engine) is based on logic from Festival and Flite, though it really is mostly a Flite clone in Java. Like Festival, it consists of core logic that can operate on voice data. To get a small set, you could ship only what's needed for the kevin voices, but you'd probably also want to keep the MBROLA support because it has similar benefits as what you get with Festival. I think the total would be about 6.5Meg or so, but then you will also need the Java virtual machine.

Hope this helps, and please let me know if you have any more questions,


I think Flite uses the same file format. ( Will, please correct me if
I'm wrong).  It also requires a Java JRE, are you planning to include
Java in the live CD?

BTW, in the past, Java was required for accessibility,
but that's not true of the latest version.


On Fri, 2006-02-24 at 12:25, Henrik Nilsen Omma wrote:

We are working on packaging screen reader support for the Ubuntu Live
CD, but have gotten ourselves a little confused regarding file sizes ...

Being a Live CD we are quite limited on disk space. We were thinking
that we should use the smaller F-lite, rather than the full Festival,
assuming it had smaller speech files. However, because gnome-speech
doesn't have direct support for F-lite we also needed to include speech
dispatcher (and gnome-speech from CVS), so it begins to grow.

Can anyone shed some light on the relative space requirements of
Festival vs. F-lite? Does Festival include all it's supported languages by default or are they packaged separately (as packaged in Debian)? We
would be happy to settle for English-only support this time around.

Thank you. Any advice will be greatly appreciated.

- Henrik

gnome-accessibility-list mailing list
gnome-accessibility-list gnome org

gnome-accessibility-list mailing list
gnome-accessibility-list gnome org

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]