Re: RFC: keyboard/mouse usage for accessibility?




On 3/18/2019 2:46 PM, Samuel Thibault wrote:


Is there any capabilities in at-spi that allow a speech recognition
environment to query the application and find out enough context to be
able to generate appropriate grammars? for example, using  a
multipurpose editor. I may want to have different grammars for different
types of data. I need to know what tab has focus and something about
that tab (TBD) so that I can generate the right grammar to operate on
data within that tab.
At-spi provides information of which widget has focus, and then with
at-spi you can inspect the list of actions.

A speech driven environment  only partially cares about focus. You care
when you're dictating text into a specific target. Because dictating
text at random especially if there are shortcut key commands. However, 
speech command context is not limited to the GUI context. For example,
let's say focus is on a dialogue box in a word processor doing something
with a table. But you remember you need to send an email message so you
say "take a message".

This command has nothing to do with the current focus and would not be
revealed as part of the widget actions. It's a global command because
the environment knows it has an email program and the top level commands
for email are visible  globally. once you are in the email client like
Thunderbird, your focus could be in the list of mailboxes or list of
messages or within the message itself. It doesn't matter because saying
the command "next message" always moves to the next unread message no
matter where you have focus within the email client.

I guess one way to think of it is if you look at the screen and you see
something you want to do you should be all the say it and not have to
worry about where the focus is because your eyes see the focus, your
brain determines the context and understands the commands for the
operation, and screw the mouse, I'm not not touch that damn thing
because it hurts

One way to understand how the user interface changes is to sit on your
hands and tell somebody else what to click on. We need to instruct the
somebody else to be really stupid and only do exactly what you say. That
will give you feeling for the state of speech  interfaces today. Then if
you and the somebody agree on the names of things, do the same exercise
and you will find how much easier it is to work



Each cell of the database has a type and a name. For text fields,
saying "Change <name>" Should put me in the cell of that name. But if
the cell is a multi-selection list,  the grammar should be all of the
options for the multi-selection list. If if the cell is a number, I
want to limit the grammar to just numbers.
AFAICT, the type of content is not exposed in at-spi, but that could be
added.
Exactly. every content field in a GUI should be exposed so the speech
environment can read that field and edit it programmatically.

There are a bunch of other types such as email address phone number URL
people and other limited board non-English language elements that could
be spoken. Each of which need their own grammar.
The exact allowed grammer could be passed through at-spi.

The grammar wouldn't be passed through at-spi. The type of information
and any other context information would be passed through to speech
interface layer which would in turn build the grammar that the user can
speak.

Side note: people do not understand the speech interface until they try
to use a computer without hands. And even then, some people can't think
beyond the keyboard and mouse. If I get frustrated when explaining some
of these concepts, I apologize in advance because I've spent over 20
years with this adult acquired disability and all the societal shit that
comes along with it.  My frustration comes from seeing people
reinventing the same failed techniques for the past 20+ years. 
Specifically a solution that tries to leverage a keyboard interface or
trying to deal with complex operations through  vocalizations and not
speech. The latter is more common when people try to figure out how to
program using speech recognition.

One thing I've learned in and building speech interfaces is that the
grammar is incredibly personal. Is based on how we speak, how we use
language, what idioms we grew up with. Because speech interfaces give
you no clues about what you can say it's very hard to train people to a
standard language. To me the answer is giving the end user the ability
to modify the grammar to fit how they speak.

I'm twitching now because I see the start of the conversation that heads
down that path of yet again reinventing or advocating for stuff that has
failed repeatedly. I apologize for my twitching and for my anticipating
what you haven't said yet.


One of the problems though with the notion database is that there are no
row names except by convention. Therefore whenever you use a name to
sell, somebody needs to keep track of the role you are on and no
command  should take you off of that row.
I'm not sure to understand.

Picture a spreadsheet. The spreadsheet has numbers across the top one
through infinity but on the side instead of the usual A-Z, there's
nothing. So when you operate on that row, you can only operate on 
horizontally adjacent cells and not referred anything above or below.


The last issue is a method of bypassing JavaScript  backed editors. I
cannot dictate into Google Docs, have difficulty dictating into
LinkedIn. In the browser context, only naked text areas seem to work
well with NaturallySpeaking.
That is the kind of example where plugging only through at-spi can fail,
when the application is not actually exposing an at-spi interface, and
thus plugging at the OS level can be a useful option.

But we still have the same problem of the speech environment needing to
reach into the application to understand the contents of buffers, what
can commands are accessible in the application context etc. It may not
be a problem we can solve in a way that's acceptable to others but
basically an API that roots around in the bowels of the application is
where we need to go. Think of it as a software endoscopy or colonoscopy.

One possibility for dealing with these kinds of crappy interfaces is to
have the ability to tell the application to export what's visible in a
text area plus a little bit more into a plaintext form which could then
be edited or added to using speech in a speech-enabled editor. maybe
something like converting rich text to mark down+ so it's all speech
editable.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]