Baby Steps for Visual Processing

Noah Bliss

unread,

Feb 20, 2017, 8:58:22 PM2/20/17

to opencog

New idea, but I must first admit this does seem to stray a bit from our long-term goal of analog embodiment, but I think one of our "stepping stones" has been sitting right in front of us for a while, the Linux desktop. What if we created an API that allowed our system to interact with its host? We could easily fetch window title, window position, and through Wayland's new RDP protocol, potentially "slice" otherwise analog data in a way that a system can process. This could also be a very tangible practical value to our system. If it is able to know cursor position, window name and position, as well as miscellaneous meta data, it could very soon become a truly useful personal assistant for us that could be capable of most Linux operation and we would be able to use this as a stepping stone for more adventurous analog visual and spacial processing systems. With working with the Linux desktop, we have the very marked advantage of having all resources accounted for and in digital format already. Rather than define our targets, we would merely have to tie them in.

Projects that I think would be useful towards this end.

-Wayland, specifically with the RDP module that is now being optionally baked right in.

-RDP/X11, or another form of window manager with an API.

We could also forgo the composition manager completely and tie right into a window manager that easily allows scripting such as i3.

-x[tools] examples include xdotool, xclipboard, etc. These would allow our system to create its own input and accomplish its own tasks.

-Synergy. While this is a software primarily intended for cross-platform software KVM, we may be able to borrow code for mouse tracking and keystroke logging that the AI can track and learn from.

These are just off-the-wall ideas. I am still in the process of learning Python and intend to make myself useful in more practical ways as soon as possible, but if someone with skills wanted a side project, feel free to steal this.

Regards good friends!

Noah B.

Keyvan Mir Mohammad Sadeghi

unread,

Feb 21, 2017, 7:56:49 AM2/21/17

to opencog

https://groups.google.com/forum/#!topic/opencog/eLAc8q2wISQ

---------- Forwarded message ----------

From: Keyvan Mir Mohammad Sadeghi <key...@opencog.org>
Date: Sun, Mar 23, 2014 at 11:30 PM
Subject: Re: [opencog-dev] What AGI projects can learn from Deep Mind, Vicarious, etc.
To: Benjamin Goertzel <b...@goertzel.org>

I.e.: doing something easily visually appreciable, that previously only humans could do...

I've been thinking about that for a while... So dialogue systems seem so hot right now, trying to think one step ahead, an AGI that *understands* GUI strikes me as the next big thing. It has all the ideal characteristics listed in the classic literature: Accessible, Single agent, Deterministic, Episodic, Recoverable. And in terms of demo, it'd hit anyone to see an AI that can drive Windows, Android, etc. and execute given commands. Learning is easy to integrate, inputs and desired outputs are obvious.

What do you think?

--
Keyvan Mir Mohammad Sadeghi
MSc AI

"One has to pay dearly for immortality; one has to die several times while one is still alive." -- Friedrich Nietzsche

Keyvan Mir Mohammad Sadeghi

unread,

Feb 21, 2017, 8:06:20 AM2/21/17

to opencog

Also have a look at:

http://www.sikuli.org/

https://github.com/RaiMan/SikuliX-2014

Reply all

Reply to author

Forward