Re: [g-a-devel] Happy patch bonanza

>>>>> "BH" == Bill Haneman <Bill Haneman Sun COM> writes:

    BH> So it seems a more general/robust method is needed for
    BH> determining the correct encoding for the output channel.  For
    BH> some voices it's apparently UTF-8, whereas for most european
    BH> voices it's "latin 1".  Presumably some languages may need
    BH> latin2, etc. instead...

Yes.  IMO a reasonable approach is to use the coding declared by the
voice and to use ISO-8859-1 if the voice doesn't declare its coding.
This is what festival-freebsoft-utils does.

Preferably all voices should declare their coding.  There's no standard
way to do that in Festival; festival-freebsoft-utils introduces just
another item in the voice declaration called `coding' for that purpose.
It's trivial to add it and it's IMHO better than introducing new
configuration options to all the Festival frontends.

The festival-freebsoft-utils current-voice-coding function is trivial:

  (define (current-voice-coding)
    (or (cadr (assoc 'coding (cadr (voice.description current-voice))))

If all you need from festival-freebsoft-utils is this function then
there's no need to require the whole festival-freebsoft-utils package to
be able to figure out the voice coding.

    BH> Actually, festival _is_ UTF-8 capable, at least for some voices.

It is not.  The UTF-8 voices handle the UTF-8 input as a sequence of
8-bit characters.  Of course this is far from being comfortable and one
can't use many standard Festival functions on such an input.  So UTF-8
is used in Festival only for languages which can't represent their
character set in an 8-bit coding.

Of course, the best way would be to make Festival work with Unicode
characters.  But I think this is a non-trivial task and apparently
nobody works on it.  So I'd suggest to use the `coding' voice property
workaround described above for now.
    BH> I still think ISO-8859-1 might be a better 'default' for the
    BH> festival driver than UTF-8, since as far as I know none of the
    BH> european voices expect UTF-8 input.  



Milan Zamazal

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]