Re: RFC: keyboard/mouse usage for accessibility?



Eric Johansson, le lun. 18 mars 2019 15:50:23 -0400, a ecrit:
On 3/18/2019 2:46 PM, Samuel Thibault wrote:
Is there any capabilities in at-spi that allow a speech recognition
environment to query the application and find out enough context to be
able to generate appropriate grammars? for example, using  a
multipurpose editor. I may want to have different grammars for different
types of data. I need to know what tab has focus and something about
that tab (TBD) so that I can generate the right grammar to operate on
data within that tab.
At-spi provides information of which widget has focus, and then with
at-spi you can inspect the list of actions.

A speech driven environment  only partially cares about focus.

With at-spi, from the widget which has focus, you can go around the
whole tab.

But you remember you need to send an email message so you say "take a
message".

That could be an action attached to a global shortcut, which could be
inspected as well.

if you look at the screen and you see something you want to do you
should be all the say it

That could also be inspected through at-spi.

There are a bunch of other types such as email address phone number URL
people and other limited board non-English language elements that could
be spoken. Each of which need their own grammar.
The exact allowed grammer could be passed through at-spi.

The grammar wouldn't be passed through at-spi.

? It has to be exposed somehow by the application. I don't think we
want to enumerate in at-spi all kinds of input to be set in a field
(number, e-mail, telephone number with all kinds of formats, etc.) while
the application can just pass the grammar it already knows about. Or
maybe we just don't mean the same thing by "grammar". I'm taking it in a
computer-science way, for a number, the grammar would be [0-9]*.

One thing I've learned in and building speech interfaces is that the
grammar is incredibly personal.

We then need to relate this with actual inputs in the applications.

To me the answer is giving the end user the ability to modify the
grammar to fit how they speak.

How would they express it?

One of the problems though with the notion database is that there are no
row names except by convention. Therefore whenever you use a name to
sell, somebody needs to keep track of the role you are on and no
command  should take you off of that row.
I'm not sure to understand.

Picture a spreadsheet. The spreadsheet has numbers across the top one
through infinity but on the side instead of the usual A-Z, there's
nothing. So when you operate on that row, you can only operate on 
horizontally adjacent cells and not referred anything above or below.

It doesn't work to use expressions such as "row below" or "row #2"?

The last issue is a method of bypassing JavaScript  backed editors. I
cannot dictate into Google Docs, have difficulty dictating into
LinkedIn. In the browser context, only naked text areas seem to work
well with NaturallySpeaking.
That is the kind of example where plugging only through at-spi can fail,
when the application is not actually exposing an at-spi interface, and
thus plugging at the OS level can be a useful option.

But we still have the same problem of the speech environment needing to
reach into the application to understand the contents of buffers, what
can commands are accessible in the application context etc.

Yes, but both ways can cooperate.

Think of it as a software endoscopy or colonoscopy.

The problem is that there will always be software that doesn't allow it.
So you also want a generic solution.

Samuel


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]