Making a program that can "see" what's on the screen

Peter Olcott

unread,

Oct 14, 2005, 12:57:05 PM10/14/05

to

Seeing all the text, identifying the font details, and recognizing every icon:
What are the ways that this can be done? I am sure that there are at least
some ways to provide for users without eyesight. What are all the kinds
of ways that a program can "see" what is on the screen, including "seeing"
inside all windows.

William DePalo [MVP VC++]

unread,

Oct 14, 2005, 1:24:19 PM10/14/05

to

"Peter Olcott" <olc...@att.net> wrote in message
news:vrR3f.5227$bt2.2516@okepread05...

<disclaimer>
This is far from my field of expertise ...
</disclaimer>

... but I think what you are trying to do comes under the heading of "Active
Accessibility" in Windows

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/msaa/msaaccrf_87ja.asp

Regards,
Will

Peter Olcott

unread,

Oct 14, 2005, 2:45:20 PM10/14/05

to

"William DePalo [MVP VC++]" <willd....@mvps.org> wrote in message news:%230IJfQO...@tk2msftngp13.phx.gbl...

Is there anything other than this available ?

William DePalo [MVP VC++]

unread,

Oct 14, 2005, 3:30:03 PM10/14/05

to

"Peter Olcott" <olc...@att.net> wrote in message

news:%0T3f.5229$bt2.795@okepread05...

As I said, it is far from my field of expertise so I'm not sure. That said,
I _think_ that that is all that Windows provides out of the box for those
with physical challenges.

If you wanted an image of the desktop - in the style of PC Anywhere or
WinVNC - I would have suggested a "mirror" display driver but I didn't think
that's the kind of support that you were after.

Regards,
Will

Hector Santos

unread,

Oct 15, 2005, 7:03:59 AM10/15/05

to

We develop electronic mail software and our original Mail Reader package,
developed during the 80s, used a common design practice standard called SFI
or "Speech Friendly Interfacing" a feature offering for the blind.

Back then, support came with special cards, like "VocalEyes", etc. Your
application check a special interrupt during startup which allows you to
enable SFI for the user automatically. I forget the interrupt, but it was
standard amoung various Speech Card vendors.

It was tied to the CUI or Common User Interface standards in practice.
Windows has a CUI standard and for a speech system, you need to make sure
you follow CUI to the best of your ability in order to accommodate speech.

Now today, William pointed out the Windows sub-system for speech
accessibility. We have not worked with this, but when I first saw it a fews
back , I immediately recognized its design based on similar 1980's SFI/CUI
concepts. I'm sure Microsoft learned from the early renditions.

In short, its all about the "hooks" into the events/message queue system,
for example the CmdUI object.

A quick example is how your menu system works in Windows. For each menu
items, you can run it by a OnUpdateXXXXXX control message. On this event
handler, you can access the text for the menu and then pass it to a speech
card or system.

example: no compile checking...

void CMyView::OnUpdateMenuItems(CCmdUI* pCmdUI)
{
CString sText = "";
int nIndex = pCmdUI->m_nIndex;
if (pCmdUI->m_pMenu) {
pCmdUI->m_pMenu->GetMenuString(nIndex,sText,MF_BYPOSITION);
SpeakText(sText);
}
}

Here is the trick. FORGET THE MOUSE DURING DEVELOPMENT. Design your SFI
application so that the keyboard navigation follows CUI keyboard standards.
When the user tabs around, uses the UP/DOWN arrows, etc, your program
OnUpdateXXXXX events capture the text

Hope this provides the basic idea.

PS: We don't work with this any more but when I last left it, I remember the
direction was to support Voice Recognization, at its simplist, to run
commands.

--
Hector Santos, Santronics Software, Inc.
http://www.santronics.com

"Peter Olcott" <olc...@att.net> wrote in message news:vrR3f.5227

Tim Roberts

unread,

Oct 15, 2005, 4:58:37 PM10/15/05

to

What you're describing -- deconstructing the screen by starting with a pile
of pixels -- is NOT the way it is done. You have to get to the controls
BEHIND those pixels.

Given a point on the screen, you can find the window that contains that
point. From there, you can query the attributes of that window. If it has
a menu, you can fetch that menu, and fetch the individual menu items for
reading. If it is a text control, you can GetWindowText to fetch its
contents and read those. If it is a dialog, you can fetch the controls in
order, and with GetWindowsText you can give a pretty good read of the
dialog.

It's all about finding the data structures that make up the window, NOT
about OCRing the pixels themselves. That's practically impossible.
--
- Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

Peter Olcott

unread,

Oct 15, 2005, 11:14:34 PM10/15/05

to

"Tim Roberts" <ti...@probo.com> wrote in message news:kbr2l1tjrrjv81f44...@4ax.com...

I am not sure of the underlying details. I can only describe my requirements
from what I need to achieve. I am not sure what the best approach is. I need
to be able to somehow "see" all of the text and all of the icons that every
application places onto the screen, no matter how they do it. This must work
for every single piece of text anywhere on the screen, and every single icon.
It doesn't matter whether its directly on the desktop, within an MS windows
system utility, or displayed by an application. I also need the exact pixel
locations of this text, and icons, and I need to know the Font Characteristics,
Name, Point Size, Bold, Italic, Underline, Strikethrough. If for example an
application has a save file Icon (the little floppy disk) I need to know that it
does, and exactly where on the screen it is located.

Thanks

TC

unread,

Oct 16, 2005, 12:25:32 AM10/16/05

to

Tim Roberts wrote:

> Given a point on the screen, you can find the window that contains that
> point.

If there is one! But in an MS Access app, for example, none of the
command buttons are ever windows, and the textboxes are only windows
when they have the focus (not when they don't).

In this example, if you know the screen coordinates, you can use the
Access object model to find the control in question & thereby access
it. But what if the app in question does not have an accessible object
model?

So there is no general-purpose solution to what he wants, surely.

TC

Tim Roberts

unread,

Oct 16, 2005, 11:15:30 PM10/16/05

to

"Peter Olcott" <olc...@att.net> wrote:
>
>I am not sure of the underlying details. I can only describe my requirements
>from what I need to achieve. I am not sure what the best approach is. I need
>to be able to somehow "see" all of the text and all of the icons that every
>application places onto the screen, no matter how they do it. This must work
>for every single piece of text anywhere on the screen, and every single icon.
>It doesn't matter whether its directly on the desktop, within an MS windows
>system utility, or displayed by an application. I also need the exact pixel
>locations of this text, and icons, and I need to know the Font Characteristics,
>Name, Point Size, Bold, Italic, Underline, Strikethrough. If for example an
>application has a save file Icon (the little floppy disk) I need to know that it
>does, and exactly where on the screen it is located.

Sorry. What you have described is simply impossible.

I suppose you could hook all of the display driver entry points (assuming
you could figure out how), and then force all of the windows to repaint
themselves. Even then, there is no way to distinguish an icon. It's just
another bitmap.

That's the ONLY way to get the kind of information you want -- watch it
being drawn. Once it becomes pixels, it is too late.

TC

unread,

Oct 16, 2005, 11:41:32 PM10/16/05

to

Sort of like trying to put Humpty Dumpty back together again :-)

TC

Mark Howard

unread,

Oct 28, 2005, 3:12:04 AM10/28/05

to

It's not impossible since someone has already done that. Take a look at
http://www.fanix.com/asuread/

--mark

TC

unread,

Oct 29, 2005, 5:46:17 AM10/29/05

to

How do you know they have done it? There are substantial technical
problems involved in reading text from /any/ program that can display
text on the screen. A quick look at their site does not show any
guarantee that they have solved those prtoblems & can therefore handle
/any/ programs - the OP's specific requirement. Maybe it only works
with /some/ programs. Maybe it only works with /one/ program (the
browser). Do you know any of this for sure, or are you just guessing?
If the latter, I have a wonderful bridge in Sydney Australia that I can
sell to you at a very keen price!

TC

Tim Roberts

unread,

Oct 29, 2005, 1:53:28 PM10/29/05

to

"Mark Howard" <MarkH...@discussions.microsoft.com> wrote:
>
>It's not impossible since someone has already done that. Take a look at
>http://www.fanix.com/asuread/

That solves a different problem.

I can't tell whether this works by installing a window hook and watching
all the text that it sent to the screen, or if it tries to locate the child
window under the cursor and reverse-engineer it to find the text, both of
which I believe I described earlier.

Either option will work for most applications. However, if an application
draws its text into an offscreen bitmap and then blts it to the screen, I
guarantee you that As-U-Read will not be able to read it.

I found nothing on their web site that mentions icons.

The problem as Peter has spec-ed it requires OCR processing that we do not
yet possess.

TC

unread,

Oct 29, 2005, 10:04:28 PM10/29/05

to

To give you a specific example of one case where it will not work:

In an MS Acess application, none of the textboxes are actually proper
windows, except the single one that has the focus. All of the other
ones are just painted to /look like/ windows. They are /not/ actually
windows, they do not have a window handle (hWnd), and their content can
not be accesssed using any normal win32 API calls.

So if you position the cursor over one of those windows, there is no
way to get the text from it, except by:

(1) Using the MS Access object model;

(2) Using the Windows "Active Accessbility" feature, or

(3) Doing optical character recognition (OCR) of that portion of the
screen.

No "general purpose" product will succeed with (1). The application in
question might not /have/ an object model.

Similarly, the application might not support Actve Accessibility, so
(2) is also out.

(3) is possible, in theory, but (a) there's no indication that the
product you mention is doing (2), and, there is no guarantee that (2)
would work well - or at all - in a PC screen context with, for example,
an underlying wallpaper graphic.

Conclusion: it aint gonna happen!

HTH,
TC

Mark Howard

unread,

Oct 30, 2005, 12:39:01 AM10/30/05

to

I don't know their method is your (1), (2) or (3) or maybe "(4) - None of the
above" but As-U-Read works in MS Access -- I've just tried it.

--mark

Mark Howard

unread,

Oct 30, 2005, 12:47:04 AM10/30/05

to

Their website says "Compatible with all web browsers, e-mail composers, news
readers, forums, chat rooms, instant messenger applications etc." so I
suppose it will work with normal applications. However, whether it works with
icons or not I dunno (their software's purpose is to capture a text under the
mouse cursor). So my point was it's not that impossible.

However I don't think OCR can solve the problem with icons, because even
human eyes cannot tell the different between a normal 32x32 bitmap and a
32x32 icon just from the screen shot.

--mark

"Tim Roberts" wrote:

>
> That solves a different problem.

> ...

TC

unread,

Oct 30, 2005, 3:05:44 AM10/30/05

to

You are saying that it can read the text from an MS Access textbox that
*DOES NOT* currently have the focus?

Sorry, I have to say I find that hard to credit. If you confirm it, I
will find some way to d/l their (large) example demo & try it myself.

TC

unread,

Oct 30, 2005, 3:23:50 AM10/30/05

to

Oops, I will not be able to try it myself because all my boxes are win
9x ones, at present, and I see it only works with NT+. But please post
back here, & confirm that it works with textboxes that /do not/ have
the focus.

TC

Hector Santos

unread,

Oct 31, 2005, 5:22:27 AM10/31/05

to

"Mark Howard" <MarkH...@discussions.microsoft.com> wrote in message
news:2944AAF3-E452-4372...@microsoft.com...

> Their website says "Compatible with all web browsers, e-mail composers,
news
> readers, forums, chat rooms, instant messenger applications etc." so I
> suppose it will work with normal applications. However, whether it works
with
> icons or not I dunno (their software's purpose is to capture a text under
the
> mouse cursor). So my point was it's not that impossible.
>
> However I don't think OCR can solve the problem with icons, because even
> human eyes cannot tell the different between a normal 32x32 bitmap and a
> 32x32 icon just from the screen shot.

But the computer does not care for this - it is all scan lines.

I wasn't going to get into this, but of course, it is all possible - FOR
standard CUI or known images.

Hints:

- Black vs. White (Grey scale)
- Keep a dictionary hash table of known bit matrix tables.

With a highly optimize video scanner, you can detect a video bit map in the
same way OCR works for fonts.

So of course, its possible. The question is, is it worth it?

For the visually impaired, SFI (Speech Friendly Interface) methods existed
since the late 80s, early 90s which we participated in providing support for
in our mail reader products. One the long time vendors for visually
impairs user is "Vocal-Eyes" from GW-MICRO. http://www.gwmicro.com/ So
if you want to hear it from real experts in this area, contact gw-micro.

SFI was based on "tracking" or more precisely, knowing exactly where the
cursor was located at all times relative to the CUI (common user interface)
and what special keyboard actions performed.

A system like vocal-eyes, offer basic screen reading, but it also provided
an API for direct application support where SCREEN READING was not required.

So for example, most commercial grade software with SFI support will make
sure their CUI is "generic" in design so that menus, input, keyboard, all go
thru some input and/or event pump engine. This information is already
known. Since the engine is single source, you don't need to read the
screen, you read a TABLE of your menu or resource text information.

With the advent of windows and message event pump engines, it made it
easier or rather more of a standard. This is why a Windows application no
longer really needs to add SFI itself. It can use the subsystem. It would
not be as fine tune as a more dedicated system, as in our SFI implementation
where it was smart in not being overly VERBOSE and used optional phonetic
tables to improve the sound output, but Windows itself can provide the basic
SFI.

Of course, the "images" were ignored. But I don't see that as a problem
when trying to detect the "CUI" images. A binary image has a unique matrix
of scan line information. This can easily be hashed into a dictionary.

Again, is it worth it?

Maybe not for your basic SFI stuff because the CUI for images include the
recommendation of using tips such as in a toolbox. Tips can pumped thru a
single source engine, therefore as the user tabs thru the images, he can
hear the tips.

Maybe Peter has other ideas in mind, its not impossible to do, but if he was
just looking to support SFI ideas, the ideas for this has long been
established to support the visually impaired.

TC

unread,

Oct 31, 2005, 7:18:41 PM10/31/05

to

But why did you have to "participated in providing support for [it] in
[y]our mail reader products", if it can all be done externally, with no
support from the programs concerned?

The OP has not asked how to modify existing programs to support his
needs. He has asked how to write a new program which he drops-in to the
PC, which will then automagically recognize everything that is on the
screen - including icons.

I say that will not be possible without a substantial amount of
specialized programming.

Cheers,
TC

Hector Santos

unread,

Nov 1, 2005, 5:53:27 AM11/1/05

to

"TC" <aatcbb...@yahoo.com> wrote in message
news:1130804321.4...@g47g2000cwa.googlegroups.com...

> But why did you have to "participated in providing support for [it] in
> [y]our mail reader products", if it can all be done externally, with no
> support from the programs concerned?

I believe I alluded our support started in the 80's which was before there
was a OS SUB-SUBSYTEM offering anything close to it, like in the DOS days.

I also indicated direct SFI support improves readability by customizing your
verbosity. Otherwise you are left with what the SUB-SYSTEM speaks to you
which is most often more verbose, especially after the new-user becames a
non-new users.

> The OP has not asked how to modify existing programs to support his
> needs. He has asked how to write a new program which he drops-in to the
> PC, which will then automagically recognize everything that is on the
> screen - including icons.

But he indicated for the visually impaired so I think they he was new to
this well established area where there is already established ideas. He
asked a natural question not really knowing what existed and what methods
were used.

> I say that will not be possible without a substantial amount of
> specialized programming.

But it is possible nonetheless. That's all.

These are old established OCR ideas. Nothing new in the idea of trying to
recognized the 'unknown.' OCR was programmatic when mixed fonts were used.
But it got better, extremely better. We use to manufacture a Electronic
File Cabinets called "OptiFile" for doctors and lawyers that scanned in
documents, and used isolated OCR ideas to index the documents. It was
terrible at first because you had to handle each document individually.
Sometimes it required hiring cheap labor to clean up the scanned text.

For the blind, I believe, it is more straight forward because you are
focusing more on what's already known and its extremely isolated.

For icons, well, to me, that's a new one for the blind, but we don't know
why he would be interested in SPEAKING what a bit map looks like. But
blind or not, its possible and you would start with a well defined
dictionary table of bit-map matrices hash values. This would be extremely
simple to catalog and with the speed of computers today, you can detect any
kind of known bitmap on the screen using simple pattern recognition
techniques extremely fast.

TC

unread,

Nov 1, 2005, 10:13:38 PM11/1/05

to

Ok, thanks for that info. I'll read & digest it further :-)

I emailed the makers of the product in question. They replied to the
effect that it does work with MS Access, but they won't say how.

Cheers,
TC