Re: Request for Assistance Finding FLOSS Speech Recognition Software



Hi,

Am Mittwoch, 27. Juni 2012, 06:54:44 schrieb meg ford:
> > I'm not sure though if it's good to start with so many (relatively) words
> > right away (I think it's around 35 words or so). A Dasher setup would lead
> > to
> > good results much quicker (only around 7 word).
> 
> Can you explain this? Are you suggesting that he use Dasher for word
> prediction?
Well, there is a mode where the arrow keys give the cursor a tendency. So 
pressing "Right" would start a (slow) movement to the right, "Up" to the top, 
etc. Multiple presses would increase (same direction) or decrease (oposite 
direction) the velocity.

These key presses (basically just Left / Right / Up / Down) and some control 
words like "Write text" to bring up dasher and something like "Write this" to 
paste the "written" text can be used to build a quite powerful system (because 
of Dashers word prediction) with *very* few commands.

Of course a full, voice controlled keyboard is possible as well but it takes 
more training samples to control as effectively (because the vocabulary set 
needs to be much larger (at least 26 words for the characters as well as stuff 
like "Shift", "dot", etc.).

There are ways in Simon to gather training samples through normal usage so 
improving the model once it's in use is easier and faster (because it's 
normally more fun to use the system than to read texts from cue cards).

> > I think both languages should work equally well. There is no bias towards
> > any
> > language because you'll build your own acoustic model. It doesn't matter
> > to
> > Simon if this is English, Spanish, French or Chinese. You can even mix
> > languages if you want (take care with the phoneme sets in this case,
> > though).
> > 
> > I'd suggest you pick the language you are more comfortable with and -
> > again -
> > can pronounce more consistently (remember, you're going to call out
> > commands
> > spontaniously during "normal" usage).
> 
> This was the main problem with other programs. Joe has around three
> distinct ways of pronouncing the same word, depending on his muscle tone.
That shouldn't be a problem.
If just the inflection is different than this should be covered by the model 
itself (GMMs - normally used to model different speaking styles between 
multiple speakers in the same language - would be used to model those 
differences).
If the whole pronunciation is different (i.e. different phonemes) then we can 
transcribe one word symbol with multiple pronunciations.

Those differences are reasons why more training data will be required compared 
to a user with perfect articulation (where usually 5 training samples per word 
would suffice) but they are by no means unsolvable problems.

> > That's why we do also offer consulting through the Simon Listens e.V.
> > where we
> > basically build you a custom speech model for use with Simon according to
> > your
> > needs for a modest fee to cover our expenses (non profit organization).
> 
> This is potentially something he is interested in, yes. He may have to
> apply for funding for this, however. [...] If you have standard information 
available about this, I
> would be grateful if you would email it to me directly.
(I'll continue this off list)

Best regards,
Peter


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]