OSK access of applications UI

31 views
Skip to first unread message

Steve Lee

unread,
Dec 3, 2006, 4:58:28 AM12/3/06
to osk...@googlegroups.com
I'd like to brainstorm a little on how OSKs can access target applications UI.

The current 'pukka' way is via the a11y APIs (AT-SPI, MSAA,
UIAutomation, OS X Accessibility API). However we want an abstraction
that works cross platform and which is probably a small subset of
available functionality (UI grab and activate widgit default action).

An ability to script will be useful in any number of unpredictable or
complex scenarios. LSR looks attractive here. I'd like the simple
common stuff to be easy to setup without script though (ideally
automatic, perhaps declarative, possibly interactive).

For apps that don't support a11y API the abstraction could also allow
for direct UI access (on win32) or firing of synthetic key/mouse
events at the app (is that workable on X?). How else can such apps be
supported?

Another possibility might be to create a new protocol based on D-bus
(say) which provides simple IPC (inter process communications). That
would require buy-in to another standard from application writers
which seems unlikely given the current level of a11y API support.
Unless transparent support could be added at desktop/OS level ala GAIL
(and that's no go for custom widgets). I think Bill Haneman mentioned
D-Bus in the context of AT-SPI at Boston. What was the thinking there
Bill? How attractive is D-Bus?

Finally some application expose their own Object Model which offers
more control than is available via the UI (e.g PowerTalk gets at
PowerPoint's alt text for displayed objects). This could be useful in
a (probably small) number of cases so could be made possible by script
extensions (in the example case using Python Win32Com extensions).
However I'm aware this is rather moving someplace other than basic
OSK.

Thoughts and ideas please. What are the experiences from GOK and others?

--
Steve Lee
www.oatsoft.org
www.schoolforge.org.uk
www.fullmeasure.co.uk

billh

unread,
Dec 4, 2006, 6:24:49 AM12/4/06
to OSK-ng

Steve Lee wrote:
> I'd like to brainstorm a little on how OSKs can access target applications UI.
>
> The current 'pukka' way is via the a11y APIs (AT-SPI, MSAA,
> UIAutomation, OS X Accessibility API). However we want an abstraction
> that works cross platform and which is probably a small subset of
> available functionality (UI grab and activate widgit default action).

I actually think we should not attempt the "cross platform abstraction
layer" approach. The reality is that despite their obvious
similarities, these various APIs *behave* differently, so even if we
paper over their obvious differences with a wrapper API, I suspect we
will not be able to achieve a satisfactory degree of "workalike"
behavior. The "abstraction layer" approach tends to lead to a
too-many-layers architecture, which can be ugly and hard to maintain as
well.

My own thought is that a pluggable architecture, which abstracts out
the concepts of "UI mining" and "UI Component activation", would work
better - for each target platform the UI Grab/activate logic would be
written separately, specific to the host's accessibility API
implementation. This means that the logic/heuristics could properly
account for the behavioral differences between the various
accessibility APIs, before presenting a common abstraction to the
overall OSK framework. My point is that by putting the abstraction
layer at the "action keyboard" or "UI keyboard" level instead of the
"a11y API" level, we can avoid a lot of problems.

It does mean less code reuse, in theory, but I think it also means less
code :-) since we don't have to try and merge these disparate APIs
within our OSK framework. And of course as a pluggable framework, this
module could be omitted entirely if a "traditional"
physical-keyboard-emulating OSK would suffice for a given user
scenario.

> An ability to script will be useful in any number of unpredictable or
> complex scenarios. LSR looks attractive here. I'd like the simple
> common stuff to be easy to setup without script though (ideally
> automatic, perhaps declarative, possibly interactive).

I am wary of inventing our own declarative language here. Would it not
be enough to settle on a data format for the keyboards?

> For apps that don't support a11y API the abstraction could also allow
> for direct UI access (on win32) or firing of synthetic key/mouse
> events at the app (is that workable on X?). How else can such apps be
> supported?

I wonder to what degree we should attempt to support applications that
don't provide full-featured keyboard navigation. Mouse-only apps are
clearly very broken a11y-wise (for instance they are unusable by blind
people), and I would support pushing the problem back to those apps -
certainly in the free software world.

> Another possibility might be to create a new protocol based on D-bus
> (say) which provides simple IPC (inter process communications). That
> would require buy-in to another standard from application writers
> which seems unlikely given the current level of a11y API support.
> Unless transparent support could be added at desktop/OS level ala GAIL
> (and that's no go for custom widgets). I think Bill Haneman mentioned
> D-Bus in the context of AT-SPI at Boston. What was the thinking there
> Bill? How attractive is D-Bus?

D-Bus has some significant shortcomings in comparison to CORBA.
However there does seem to be interest in bridging those gaps, and in
eventually migrating AT-SPI's IPC protocol to D-Bus. There is no
timetable for the change as of now.

Again, I would strongly advise against this group inventing new
protocols/ABIs - we should leverage what we have and live with the
limitations, while perhaps lending our support and assistance to the
desktop-wide initiatives for migration to D-Bus.

> Finally some application expose their own Object Model which offers
> more control than is available via the UI (e.g PowerTalk gets at
> PowerPoint's alt text for displayed objects). This could be useful in
> a (probably small) number of cases so could be made possible by script
> extensions (in the example case using Python Win32Com extensions).
> However I'm aware this is rather moving someplace other than basic
> OSK.

I expect this would be of more interest/use on the Win32 platform than
on Unix/Linux.

I think we are in some danger of designing more than we are resourced
to build - thus my proposal to define a few simple modular interfaces
(presentation, UI Grab/activation logic, keyboard format, device I/O
abstraction layer) and building one example of each. The
cross-platform aspect can be addressed via platform-specific modules
for any and all of those layers, without requiring us to build an
all-singing, all-dancing implementation from the start.

Bill

Steve Lee

unread,
Dec 4, 2006, 9:46:53 AM12/4/06
to osk...@googlegroups.com
Thanks Bill, great, actually we are on much the same course;
interesting to see if anyone else has differing views. I like a
component based architecture in this area which I believe covers
pluggable (it doesn't need to use COM, corba, XPCOM or whatnot, just
an abstract base class and perhaps not even runtime pluggable). I
agree a runtime abstraction layer would be more work and hard to
maintain especially in an Open Source development environment (any
change potentially needs regression testing on all platforms). Having
separate components for each platform should be less work and I was
actually thinking of the 'abstraction' as encapsulated in the common
interface that they expose; if that makes sense.

I also agree with step approach and definitely see the 'Switch Access
to Firefox on Windows grant as a part of that.

Steve

billh

unread,
Dec 4, 2006, 10:29:06 AM12/4/06
to OSK-ng
Cool, thanks Steve for clarifying what you had in mind when talking
about abstraction layers.

Runtime-pluggability might not be critical, it depends I suppose on
whether we want/need to support access to "incompatible" toolkits
within a single desktop or not.

For 'scriptability' it might be enough, since we're talking about
python, to just load per-app python modules (i.e. firefox.py, gaim.py,
etc.) the way orca does now (and LSR does something similar, doesn't
it?). So the "scripts" are actually very powerful, though possibly
more difficult for end-users to modify than scripts written in a simple
declarative syntax.

Bill

Peter Parente

unread,
Dec 4, 2006, 9:32:29 PM12/4/06
to OSK-ng
> For 'scriptability' it might be enough, since we're talking about
> python, to just load per-app python modules (i.e. firefox.py, gaim.py,
> etc.) the way orca does now (and LSR does something similar, doesn't
> it?).

LSR maintains a collection of registered scripts which can be applied
in one of three ways: to all applications, to application with a
specific name, and to an application manually at run time. For any
given application, N scripts may be loaded and handling events
following a chain of responsibility pattern by default, but with
support for cross script communication, event consumption, etc.

>So the "scripts" are actually very powerful, though possibly
> more difficult for end-users to modify than scripts written in a simple
> declarative syntax.

I'm not sure if this helps, but nothing says scripts (essentially
Python modules) have to define new OO or procedural constructs. A
module might just consist of a series of assignment statements or calls
to pre-defined functions to set up some basic, default behaviors.

Pete

Steve Lee

unread,
Dec 5, 2006, 2:36:17 AM12/5/06
to osk...@googlegroups.com
On 12/5/06, Peter Parente <par...@gmail.com> wrote:
> I'm not sure if this helps, but nothing says scripts (essentially
> Python modules) have to define new OO or procedural constructs. A
> module might just consist of a series of assignment statements or calls
> to pre-defined functions to set up some basic, default behaviors.

And that could be made data-driven using python literals and data
types (e.g tuples) which makes us declarative in my book. XML isn't
the only way ;-)

We could do with some user personas but I'm thinking of clinical
supporters or facilitators who are basically non technical but want to
set something up for a user thats isn't available from the UI options.
I think that the next simplest is declarative, e.g .ini files (for a
primitive solution).

I note that Inno setup uses ini files to excellent effect; simple
things are easy and you can get complex behaviors without using the
scripting it also offers. An error-limiting GUI editor helps users
too. http://www.jrsoftware.org/isinfo.php

I think this 2 level approach is worth pursuing given the user base. I
want to empower less technical users to customise. Having to maintain
2 systems may appear to be a lot of work but once the basic parsing
engine is written (or borrowed) it will just call script functions or
primitives shared with the scripting environment.

Scripts obviously give ultimate power but we don't want to deter due
to perceived complexity or a steep learning curve coupled with a high
probability of having 'bugs'. Thus if we go the script route we need a
language that appears simple and I think Python wins there, though I
can't see it from a newbie perspective. I believe Python is
successfully used in teaching programming. In the web space many are
happy to dabble with a bit of Javascript but my view is that it may
not be as easy to pick up or use for simple results. is It interesting
to note that work is progress to add Python scripting to Mozilla.

For techies scripting works equally for simple and complex
requirements and a declarative option will be ignored by many. However
some techies do appreciate declarative too (e.g XLST).

Script also really needs development tools. A minimum of a debugger
avoids all those message prompts used to gain some visibility. Python
helps with its interaction and introspection but there is a learning
curve again. And there are debuggers we could use.

Steve Lee

davidb

unread,
Dec 17, 2006, 11:51:09 PM12/17/06
to OSK-ng
Steve Lee wrote:
> I'd like to brainstorm a little on how OSKs can access target applications UI.
>
> The current 'pukka' way is via the a11y APIs (AT-SPI, MSAA,
> UIAutomation, OS X Accessibility API). However we want an abstraction
> that works cross platform and which is probably a small subset of
> available functionality (UI grab and activate widgit default action).
>

[Enter IAccessible2 stage left... ]

Seriously though. I'm still wondering what this new API is going to
mean for us. Thoughts?

Steve Lee

unread,
Dec 18, 2006, 2:34:01 AM12/18/06
to osk...@googlegroups.com
Yes, enter with with pyrotechnics, brilliant news.

OK here's some thoughts.

Obviously when ALL programs support it will be in a great situation as
we'll have advanced access for discovery and control as GOK can via
AT-SPI. We also get the cross platform solution, though Mac is lagging
and rather an unknown at this point. We want to take advantage of it.

But right now we playing a waiting game for when we have even good
levels of support, We're hoping that IBM, FSG and others do a good job
of evangelising and that developers decide to support it.

If we rely on it then we obviously can't easily access programs that
don't support it. Or support will be patchy. Users are going to get
frustrated if only some applications are well supported, even if we
can say 'well that doesn't support the APIs, tsk tsk'. Thus some sort
of fall-back access is needed. I'm not sure if OSK-ng could be a
persuasive example for developers to provide IA2 support in their
programs. It Would be great if it was.

I think the lack of automatic support for standard windows controls is
quite a large problem (cif MSAA and GAIL). That would at least give
some coverage of many programs (most menus at least) and act as an
incentive for developers to add support for their custom widgets.

At present MSAA is limited but should be enough for the basics as we
have IAccessible::DoDefaultAction. Not sure yet about discovery. How
much functionality does GOK use on AT-SPI to do its stuff (e.g UI
Grab)?

Is there a list of programs that support IA2 right now? I guess OOo
and FF 3 will soon. For applications that do not support it we could
emulation the advanced AT-SPI style options that have high utility
using hacks such as direct UI access and synthentic events. That could
eventually be thrown away. The alternative is we stick to using the
least common denominator of functionality which may be overly
restrictive.

David Bolter

unread,
Dec 18, 2006, 10:25:07 AM12/18/06
to osk...@googlegroups.com
Thanks Steve.

Gosh it has been a long time since I saw a DoDefaultAction(); Nostalgia
setting in (or is that naus-talgia)?

Apart from the basic gnome accessibility stuff like performing actions
(some controls have multiple actions, in which case gok can represent
each action as a key on a dynamic keyboard) and getting information
(e.g. name) and of course walking the tree in various ways, GOK uses
some of the advanced api such as the text editing interface, and the
selection interface, but working with other advanced API (e.g. table) is
still in the TODO list. Note GOK was being written while some (most/all)
of these APIs where being implemented.

Of course we should always be striving for user centered design (if not
participatory design), which should help answer these kinds of questions
for us as we go. In other words, if we use API it should be because it
helps us meet a user need.

cheers,
David

Steve Lee

unread,
Dec 18, 2006, 1:13:29 PM12/18/06
to osk...@googlegroups.com
On 12/18/06, David Bolter <david....@utoronto.ca> wrote:
> Of course we should always be striving for user centered design (if not
> participatory design), which should help answer these kinds of questions
> for us as we go. In other words, if we use API it should be because it
> helps us meet a user need.

Quite agree. I'm staring to develop a persona to help guide the
Mozilla grant (code name Jambu) in the absence of participation at
this point .

Steve

Bill Haneman

unread,
Dec 19, 2006, 8:32:31 AM12/19/06
to osk...@googlegroups.com
Steve Lee wrote:
>
> Yes, enter with with pyrotechnics, brilliant news.
>
> OK here's some thoughts.
>
> Obviously when ALL programs support it will be in a great situation as
> we'll have advanced access for discovery and control as GOK can via
> AT-SPI.
I think that once we start on our OSK-ng we can start to use
iAccessible2 right away for those apps where it's present - from what
has been said I think this will include the Moz/Firefox suite and OOo in
the beginning. Those are pretty key user apps, so even if the rest of
apps don't have IA2 yet, the advanced API capabilities could really
improve the user experience over a suite that covers a lot of the user's
daily activities.

I guess most desktop users use a combination of the above
web/email/office-suite plus maybe one or two 'vertical' apps. Those
vertical/niche apps are the ones that seem most likely to give
accessibility problems, which is why putting the support in as many GUI
toolkits as possible is the long-term goal.

Since IA2 has so many similarities to AT-SPI (and the Mac accessibility
infrastructure), I think making the logic/heuristics part of the OSK
pluggable could give us similar functionality on these platforms, given
a bit of work on the various "plug-ins". Even porting logic from one to
another might not be impractical.

I do think that IA2 makes the creation of a cross-platform OSK-ng more
attractive.

regards,

Bill

Reply all
Reply to author
Forward
0 new messages