VoiceInputStream proposal

From: Vazquez Gonzalez <lsc98025 lab dit upm es>
To: gnome-list gnome org
Subject: VoiceInputStream proposal
Date: Thu, 12 Nov 1998 15:20:53 +0100 (CET)

View attach.

jose

VoiceOutputStream proposal for Gnome Project

(First of all I beg your pardon for my 'bugs' writing in English.)

  When I was younger there was a little program on my PC that translated
everything you type at the keyboard to computer voice. The hardware was a
PC XT 8Mhz and the PC speaker (SoundBlaster did not exist these days)...

  Why we dont have this service integrated in our "highly" multimedia
operating systems? Is it so dificult or not interesting? I dont think so.

  I propose a filter to convert character strings to computer voice.

  Applications? Many, software for kids, blind people and lazy ones who
want their email being readed itself.

  The API could be as simple as this:

	openVoiceSocket(language, computerVoiceSex) -> voiceSocket

	write(voiceSocket, character_string)

	closeVoiceSocket(voiceSocket)

(language, english, spanish, french, german, portugese, chinese...
...the more the best)
(computerVoiceSex means that the computer produces Male o Female voice)  

    For design I think it would be a good idea to make it as a Pipeline of
filters that can be replaced and work concurrently. This pipeline would
have two main modules or layers: 

	-> String2Phonems translator (language dependent layer)
        -> Phonems2Sound filter (language independent)

(Phonems [= 'fonemas' in Spanish] the elemental sounds we produce when we
talk. Im not sure if you name it 'Phonmes'.)

Character ___________________           __________________
String -> |String2Phonems t.| --------->|Phonems2Sound f.|--> Sounds
          ___________________ Phonem    __________________    samples 
                              Sequence

  This system could be integrated in the Gnome Sound System or be a
separated object or set of objects with their Corba IDLs to use them.

  The more dificult layer is the language dependent part. At the
Phonems2Sound part we only have to garatize that we have all the
international set of posible Phonems in male and female version.

  The String2Phonems translator will have to deal with entonation of
questions, answerss... Im dont have the knowledge to implent it and I dont
even know if it is posible, but I believe. I only propose a high level
software arquitecture to make internationalization and use posible, easy
and simple. We can make talking applications with a very simple
API, we can even make a 'talking' kernel...   

  I expect your critics, corrections... I want to know if it 'sounds'
interesting and possible, if it is a good idea or not.

  If we succeded in this subproject, perhaps we can start to think in the
VoiceInputStream to read strings of characters from human voice, (Like
some programs now try to do) but with international support from the
begining and an API to be used by any application or even the kernel of
the system. The Pipeline structure will be the same but reversed; we need
a Sound2Phonems translator (very difficult) and a Phonems2String
translator (more simple, but as dificult as the String2Phonems t.). In
that case we will posibly need some realimentation to the
Sound2Phonems t. so it can 'learn' to be better.


Jose.

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]