Orca TTS languages and IETF BCP 47 / ISO 639-x



Hi,

While looking at Orca's user-settings.conf (in Fedora 17 with GNOME 3.4) I
found that the values for a voice's "locale" and "name" do not
consistently follow a standard. For example, for the different varieties
of English in the list of Orca voices, the "locale" is always "en", and
one needs to look at the "name" to distinguish varieties like
"english-us", "en-scottish", "lancashire" and "english_rp". The values for
locale are always ISO 639-1 or (less frequently) ISO 639-3 tags, with one
exception: "(Belgium)" for Belgian French. (Locale is also a misnomer for
language tags.) The values for "name" don't follow any standard that I am
aware of; I think they simply match what is displayed in the dropdown list
if Orca's preferences.

What I had expected were language that conform to IETF BCP 47: Tags for
Identifying Languages: <https://tools.ietf.org/html/bcp47>. This would
make it easier for an external program, like Cloud4all's framework, to
launch Orca with a specific TTS language. For example, Cloud4all could
then rely on standard language tags instead of knowing Orca's non-standard
list.
I have created a spreadsheet that compares both Orca's and NVDA's language
identifiers with what I would expect according to BCP 47; you can find
this OpenOffice spreadsheet at
<http://wiki.gpii.net/index.php/File:ScreenReader_LanguageTags_2013-01-21.ods>.

I don't know if there any plans to start using standard language tags at
some point in the future (I din't find anything related to this in
Bugzilla), but if you think it's worthwhile, you can have a look at the
spreadsheet.

Best regards,

Christophe

-- 
Christophe Strobbe
Akademischer Mitarbeiter
Adaptive User Interfaces Research Group
Hochschule der Medien
Nobelstraße 10
70569 Stuttgart
Tel. +49 711 8923 2749



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]