Vision for pi_vision and AGI/atomspace

Mark Wigzell

unread,

Feb 25, 2022, 9:16:14 AM2/25/22

to opencog

Hi folks, my subject stems from having recently done a deep-dive into the pi_vision implementation. The original face detection and tracking was rusted, so I revamped it. In doing so I added in a hook for eventually augmenting the "new_face" message with some face recognition. I was informed that rather than splicing in some face detection algorithm at the pi_vision level, the "vision" would be to have the image elements reach the atomspace, and thus allow recognition to occur at a more basic level.

Therefore, pursuant to the above, I'm asking for a high level description of how AGI vision could be accomplished. Perhaps we can also address the question of why face detection and tracking are "ok" but face recognition is not? Maybe all processing should be done at a lower level?

Linas Vepstas

unread,

Feb 25, 2022, 3:32:51 PM2/25/22

to opencog

Hi Mark,

Preface for anyone else reading this: Mark is dusting off the old Hanson Robotics code for Eva. One of the subsystems was face-tracking. When your webcam was calibrated correctly, then Eva had this uncanny ability to look at you from out of the screen: her eyes would track your position. It was really pretty cool, as you really got the sense she was looking at you.

Anyway, it seems that Mark has this code working again, or almost working? A related gotcha is some of the camera-transforms in Blender needed to be adjusted, to accurately reflect that you sit about an arms-length away from your computer screen, which is small on laptops but big on desktops, etc. so eye tracking didn't work right if all these dimensions weren't accounted for. It was kind of tricky to get it all right. But when it worked, it was really cool and even spine-tingling.

What about face recognition? This too worked, in a limited setting: she could recognize a handful of faces, and pull out the names of those people from a database. There are then three questions; how did this work, back then, how can it be made to work in the short term, and what is the correct long-term architecture?

First part: "how did it work back then"? See https://github.com/opencog/ros-behavior-scripting The code might be bit-rotted, but it worked. (There was some radical meatball surgery towards the end; this might need to be revisited.) The general philosophy, back then, was that:

* The 3D locations of objects (such as faces) would be stored in the opencog "spacetime server".

* The only reason to do this was so that there could be an API for verbal propositions: near, far, next to, behind, in front of, to the left of, etc. that the language subsystem could use. That API was never built.

* The AtomSpace would hold all information about everything, e.g face #135 is actually Ben who is NN years old, lives in YY, loves robots, and is standing "next to" David (as reported by the space-server)

* Why the AtomSpace? Because its the obvious place where current sensory info: sight & sound, can be integrated in with long-term knowledge and memories, as well as the dialog/language subsystem, as well as controlling movement and behaviour (turn left, right, blink and smile..)

* Unfortunately, integrating the senses together with the background knowledge is hard. It was done in an ad hoc manner, it was under-documented, hard to use, hard to understand. An adequate framework was never developed. This is not something one college student can knock out in a few weeks. The foundation for that framework is in the ros-behavior-scripting git repo. Fragments are in other places, I'd have to dig them up.

So ... back to the question: face recognition: Sure. Whatever. If you have a module that can recognize faces, then sure, whatever, have it forward that info to the AtomSpace. That's the easy part. The hard part is to integrate it into the speech subsystem. So, when a new person appears in front of the camera, and says "Hi, my name is Mark", something has to extract the word "Mark", realize that "Mark" is someone's name, understand that there is probably a real-time correlation between that name and what the camera is seeing, take a snapshot of what the camera is seeing, and permanently tag that image with the name "Mark". To remember it. So that, minutes later, when Mark leaves the room and comes back, or months later, after a reboot, Eva still remembers what Mark looks like, as well as his favorite color, sports-team, childhood hero, mother's maiden name, last four digits of his soc sec and bank account #.

I think all that is doable, and there are many different ways of doing the above, from quick short hacks to complicated theoretically-correct approaches ... but .. this email is too long, so, let me leave it at that.

-- Linas

On Fri, Feb 25, 2022 at 8:16 AM Mark Wigzell <markw...@gmail.com> wrote:

Hi folks, my subject stems from having recently done a deep-dive into the pi_vision implementation. The original face detection and tracking was rusted, so I revamped it. In doing so I added in a hook for eventually augmenting the "new_face" message with some face recognition. I was informed that rather than splicing in some face detection algorithm at the pi_vision level, the "vision" would be to have the image elements reach the atomspace, and thus allow recognition to occur at a more basic level.

Therefore, pursuant to the above, I'm asking for a high level description of how AGI vision could be accomplished. Perhaps we can also address the question of why face detection and tracking are "ok" but face recognition is not? Maybe all processing should be done at a lower level?

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CA%2Ba9A7AYNxawVTjbn5sQXp7AjToj1xteyCnCibrBO7TZwDDsSQ%40mail.gmail.com.

--

Patrick: Are they laughing at us?

Sponge Bob: No, Patrick, they are laughing next to us.

Mark Wigzell

unread,

Feb 25, 2022, 4:15:10 PM2/25/22

to opencog

Thanks Linas, I do have the vision part working now. (sending ROS messages for FACES, LOST_FACE and NEW_FACE).

However I don't have the Eva/Sophia head working yet, I'm working towards that, any help is welcome.

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA37f2-KYNQG8oj4QQdoLNXLib4%2BeJB1hKrf5H9r5qPy%3Dgw%40mail.gmail.com.

Linas Vepstas

unread,

Feb 25, 2022, 4:23:37 PM2/25/22

to opencog

Hi Mark,

On Fri, Feb 25, 2022 at 3:15 PM Mark Wigzell <markw...@gmail.com> wrote:

Thanks Linas, I do have the vision part working now. (sending ROS messages for FACES, LOST_FACE and NEW_FACE).
However I don't have the Eva/Sophia head working yet, I'm working towards that, any help is welcome.

Well, the glue layer is in https://github.com/opencog/ros-behavior-scripting as mentioned.

There's something about moving her eyes, head, here:

https://github.com/opencog/ros-behavior-scripting/tree/master/movement

Then there's the stuff here:

https://github.com/opencog/ros-behavior-scripting/tree/master/sensors

The next obvious steps would be to see if you can get her to turn her head and blink from the atomspace apis ... easier said than done; probably soe deep dives are needed for that.

--linas

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CA%2Ba9A7BNh8mfLjOyxDOkmtzCr4nOZOmdOpdmnRwfg3G6ux5-zw%40mail.gmail.com.

Mark Wigzell

unread,

Feb 25, 2022, 4:39:30 PM2/25/22

to opencog

Yes, thanks. I'll look into those references.

I also have some errors coming from the Sophia.blend head on start up. In particular it complains of missing NOSE bone, and a number of cyclic dependencies between the bones. I haven't got any insight into using blender yet, so I'm totally on the outside ot that one.

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA37LfMp5mSa4Uuf20b-ynUfeEFF3aYG4eesOa%2BH5LM%2Byuw%40mail.gmail.com.

xanatos xanatos.com

unread,

Feb 25, 2022, 4:42:29 PM2/25/22

to ope...@googlegroups.com

Not sure if this is cogent since my application is autonomous robots in actual hardware, but maybe useful…

I used OpenCV with a carrier board ("StereoPi") for the Raspberry Pi Compute Module that breaks out both camera ports on the Pi. I automated face recognition with code that leveraged OpenCV that I came to find from one Adrian Rosebock (pyimagesearch.com) that employed Haar Cascades to determine there was a face present. Once a face is detected, it sends the center-of-face data to another Pi (the robots have three Pis in them – "cores" – a vision acquisition "core", a language "core" and vision processing core). The vision processing core (depending on the state the robot is in) takes this face positioning data, chews on it and sends the corresponding servo signals to the motor core that controls the head and eyes, and the robot follows you with its gaze and head movements. So in theory, face *detection* and tracking are always functionally available, but may be overridden/ignored by other behavioral commands/statuses.

The language processing side of things is always listening (I use python speech recognition with PocketSphinx as the recognizer which works surprisingly well) and now has several hundred routines it can engage depending on what it hears, and some conflict resolution and buffering code in case responses to one phrase would interfere with ongoing responses playing out).

The system is set up so that if I use a phrase like "my name is", or "I'd like to introduce you to" (and several similar phrases that are recognized by a fuzzy-logic kind of similarity finder I wrote), *AND* it can tell a face is present, It can filter out the name given, if any. Then a few things happen – first, the language processor confirms the name by speaking "Hello <name> - did I get that right?" and listens for a variety of words that are either affirming or denying.

On affirmation, the system immediately begins taking snapshots every 10 frames and stores them in a folder (the new faces dataset) of the person's name plus the date and time as a numeric string (Dave-202202251623 for example). Once either the person exits the view for more than 100 frames (would-be 10 snapshots) or the system gains 100 actual face snapshots, it hands off those images to another of the scripts from Adrian Rosebock (encode_faces.py) that encodes the faces and turns the whole bunch into a pickle, which is then appended to the bigger pickle that all the other known faces are in… The name and data are also written to the database of "people known", where additional data is written over time as interactions with that person accrue.

So I'm not sure if this answers your question about integrating it into the speech subsystem – I basically have the audio input and processing, audio output and visual input and processing all running in parallel on separate physical SBCs, which all talk to each other via ZeroMQ (or PyZMQ specifically).

It works very well, reasonably fast (especially given it only runs on Pi 4/8gig SBCs) and provides people interacting with the unmistakable feeling that the robot sees them, responds to their movements and speech, etc., and remembers them.

The drawback that I haven't done anything with in the past year or so, but has a relatively easy fix – is that the pickle data for a given person ages (my grandkids are no longer reliably recognized since they were 3 and 5 when I first implemented that build, and they are 6 and 8 now) – so I need to add a routine that occasionally updates the images silently in the background in the recognition pickle to keep up with changes… but I've not had the time I wanted to to do these things…

If any of this gives you anything useful to pick from, I can get you code, original source and my custom stuff. It's all Python, so I'm guessing you should be good with that.

Dave

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA37f2-KYNQG8oj4QQdoLNXLib4%2BeJB1hKrf5H9r5qPy%3Dgw%40mail.gmail.com.

Linas Vepstas

unread,

Feb 25, 2022, 7:07:27 PM2/25/22

to opencog

Hi Dave,

Thank you for that nice note! I want to splice in some comments with my own real-world experience ....

On Fri, Feb 25, 2022 at 3:42 PM xanatos xanatos.com <xan...@xanatos.com> wrote:

Not sure if this is cogent since my application is autonomous robots in actual hardware, but maybe useful…

I used OpenCV with a carrier board ("StereoPi") for the Raspberry Pi Compute Module that breaks out both camera ports on the Pi. I automated face recognition with code that leveraged OpenCV that I came to find from one Adrian Rosebock (pyimagesearch.com) that employed Haar Cascades to determine there was a face present.

The code we used also rested on a Haar cascade. It "worked great" if you were in conventional office lighting, and faced the camera squarely. It failed if you turned quarter-face, or showed a profile. It failed if your office had windows, and the shade wasn't drawn. It failed in direct sunlight. Outdoors. In stage demo and trade-show lighting conditions. We considered a medical-training robot application, where the first responder would be kneeling over the robot-dummy, and so their face would be at right-angles to the camera. The Haar cascade can't do that. (We never did find a better solution, either, at least while I was there.)

The Haar cascade was able to measure the distance between the eyes, and thus able to estimate the distance to the face, and thus able to get the parallax right when steering the robot eyes to focus on the right spot. (The two eyes in blender move automatically, so in principle, you could have a cross-eyed animation, or a roll-your-eyes animation, but we never did that.) The depth was noisey. We used an alpha-beta filter to smooth out the jitter.

I've heard vague intimations that neural nets can do better, but if so, I suspect all available systems are proprietary and expensive (and wouldn't run on a Pi, anyway) I do have some general ideas on how to improve on this situation, but it would blow up this email.

Once a face is detected, it sends the center-of-face data to another Pi (the robots have three Pis in them – "cores" – a vision acquisition "core", a language "core" and vision processing core). The vision processing core (depending on the state the robot is in) takes this face positioning data, chews on it and sends the corresponding servo signals to the motor core that controls the head and eyes, and the robot follows you with its gaze and head movements. So in theory, face *detection* and tracking are always functionally available, but may be overridden/ignored by other behavioral commands/statuses.

The language processing side of things is always listening (I use python speech recognition with PocketSphinx as the recognizer which works surprisingly well)

I never experimented directly with this, but everyone turned up their noses at this, and opted for a real-time internet connection to google speech. In retrospect, I'm wondering if this is because all the developers either had a heavy foreign accent, or had a habit of slurring their speech and/or mumbling. At any rate, trade-show floors are problematic, what with the sonic assault of neighboring booths. Questions from the audience via microphones are also a problem, although there, you could get a direct audio cable from the mixing board that the stage techs were running.

The point here is that in natural settings, audio quality is an issue. I'm not aware of the current state-of-the-art with regards to neural nets. I suspect that, again, the solutions are proprietary, expensive, and don't run on Pi. But I dunno, i'm pretty much 100% totally unplugged from that world.

and now has several hundred routines it can engage depending on what it hears, and some conflict resolution and buffering code in case responses to one phrase would interfere with ongoing responses playing out).

The system is set up so that if I use a phrase like "my name is", or "I'd like to introduce you to"

We had three versions. One was to feed text into AIML. There's an AIML-to-AtomSpace converter. It worked as well as "native" AIML chatbots, except that it took several minutes on startup, to load the database. That was almost fatal.

Its easy, "trivial", to write custom response rules in AIML. If I recall the syntax, it would be something like "PATTERN:my name is *" "RESPONSE:pleased to meet you $star-1"

The second was ChatScript. That bypassed the atomspace entirely.

The third was a chat-script-inspired domain-specific language called "ghost". The intent was that authors would be able to write rules such as "RESPONSE: please to meet you $star-1 BLINK GAZE-AT $star-1 BLINK SMILE" I guess it worked. I never saw a working demo. The actual authors were drama students, with no software experience: they felt it was "difficult programming", they were used to type-written scripts for TV shows and if it wasn't done on a word-processor, it was "programming". This was tough. Only one person was good at this, Audrey LeeAnn Brown, and she had a background in C++. And I don't think she liked ghost. I think there were some PhD students who did manage to get something going for LovingAI. But I think they too side-stepped the complexity.

I later saw a demo from a game company. It was actually fairly impressive: they had developed a GUI that allowed game designers to drag-n-drop their way through directed NPC interactions. Basically, the NPC is trying to tell the player to go to this-n-such spaceport and meet some sketchy space-pirate to get gold, weapons, etc. The dialog tree automated a lot of the low-level interaction, yet allowed fine-grained control. In this sense, the GUI's that have been developed for games are light-years beyond what you can do with AIML or ChatScript; the main problem is that they're expensive, proprietary, and have lots of core issues that would need to be fixed to apply them to robots.

Open source is great for operating systems, compilers and databases. Not so much for everything else.

(and several similar phrases that are recognized by a fuzzy-logic kind of similarity finder I wrote), *AND* it can tell a face is present, It can filter out the name given, if any. Then a few things happen – first, the language processor confirms the name by speaking "Hello <name> - did I get that right?" and listens for a variety of words that are either affirming or denying.

If you're walking that path, .. well, this is what AIML is really good at. Or, I guess, ghost?

On affirmation, the system immediately begins taking snapshots every 10 frames and stores them in a folder (the new faces dataset) of the person's name plus the date and time as a numeric string (Dave-202202251623 for example). Once either the person exits the view for more than 100 frames (would-be 10 snapshots) or the system gains 100 actual face snapshots, it hands off those images to another of the scripts from Adrian Rosebock (encode_faces.py) that encodes the faces and turns the whole bunch into a pickle, which is then appended to the bigger pickle that all the other known faces are in… The name and data are also written to the database of "people known", where additional data is written over time as interactions with that person accrue.

So I'm not sure if this answers your question about integrating it into the speech subsystem – I basically have the audio input and processing, audio output and visual input and processing all running in parallel on separate physical SBCs, which all talk to each other via ZeroMQ (or PyZMQ specifically).

The point of using ROS was that it allowed everything to be "modular", at least in theory. That you could replace one subsystem by another. Much easier said than done.

ROS uses UDP to "talk". For ROS2, they thought about using ZMQ but rejected it in favor of something else. I forget what.

It works very well, reasonably fast (especially given it only runs on Pi 4/8gig SBCs) and provides people interacting with the unmistakable feeling that the robot sees them, responds to their movements and speech, etc., and remembers them.

Moore's law.

So, ahh, one person who should have known better ordered the best, highest-resolution webcams they could find. 1280x1024 or something. You could only plug two of them into a USB hub before the USB hub was overwhelmed. And the CPU attached to that could barely keep up with the frame rate. Despite this obvious hardware-fail, there was tremendous resistance to down-scaling to a far more practical 640x480. Add to that a power, heat and cooling budget. Ugh.

Managing engineers is like herding cats. Or pushing rope. Something like that.

The drawback that I haven't done anything with in the past year or so, but has a relatively easy fix – is that the pickle data for a given person ages (my grandkids are no longer reliably recognized since they were 3 and 5 when I first implemented that build, and they are 6 and 8 now) – so I need to add a routine that occasionally updates the images silently in the background in the recognition pickle to keep up with changes… but I've not had the time I wanted to to do these things…

For more-or-less all of the performances, there was a robot operator who sat in the audience, monitoring the system in case it went haywire, over-riding any responses that were inappropriate. Putting together a good GUI that allowed the robot operator to do this, running on a tablet, is non-trivial. (It was a website, with assorted javascript attached to various bits and pieces of the processing pipeline.)

For pretty much anything non-trivial running in the atomspace, one needs some kind of visualization GUI to see what's going on. We do not have one. I personally use printf for everything, because I can. But its not, umm, usable by anyone else.

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CH0PR20MB3980707E0E973F770A5D3196B53E9%40CH0PR20MB3980.namprd20.prod.outlook.com.

xanatos xanatos.com

unread,

Feb 25, 2022, 7:41:23 PM2/25/22

to ope...@googlegroups.com

I feel your pain lol. Fortunately, I'm the only engineer:

Moore's law.

So, ahh, one person who should have known better ordered the best, highest-resolution webcams they could find. 1280x1024 or something. You could only plug two of them into a USB hub before the USB hub was overwhelmed. And the CPU attached to that could barely keep up with the frame rate. Despite this obvious hardware-fail, there was tremendous resistance to down-scaling to a far more practical 640x480. Add to that a power, heat and cooling budget. Ugh.

Managing engineers is like herding cats. Or pushing rope. Something like that.

I de-resed my camera down to something even less than 640x480 and had to work with the creator of PyZMQ's "imageZMQ", Jeff Bass, to come up with a frame buffering/overflow and use-last-in-if-delayed scheme that was fully custom. Works like a champ, no hangs, no delays, and it broadcasts via ImageZMQ to all my Pis so they can all see what the eyeballs see and operate on that. I'll never understand why ROS didn't adopt ZMQ & ImageZMQ – it's incredibly versatile, fast and efficient.

I have a frame rate of around 5 frames/second, which is more than enough for what I want it to do, since we're not driving cars on highways or anything.

Regarding folks turning up their noses at local-only recognition, that was a decision I made because I wanted my robots to be independent of an internet connection. They have to be fully functional off-line. But – as you state – background noise can be an issue. My "folks" don't have to perform in a tech-convention environment. They just have to work roaming around on my property, where there's virtually zero background noise, unless it's really windy lol

As for the face-recognition with Haar cascades, I admit to cheating a little. Usually, when I introduce someone new, and get the initial photo dataset, I make a point of making sure I surreptitiously trigger the face-learning scripts in different environments, not just my office. I have the robot's "known people" interact with them up on my porch during daylight, evenings with artificial lighting, etc., and it adds to their database, and the recognitions become more robust… when I do that. But again, I haven't really done much in about a year, partially because I am seeing tech SOAR past what I can do on my own. It's a little disheartening. I see robots capable of *INCREDIBLE* facial expression (Ameca, https://www.cnn.com/videos/business/2021/12/08/humanoid-robot-ameca-lon-orig-tp.cnn), linguistic awesomeness (I have GPT-2 running on a Pi, but I'll NEVER get GPT-3 running on a Pi lol), and unimaginable dexterity (Boston Dynamics Atlas). When I started in this, I was competitively doing OK. The world rocketed past me lol. Now my robots are, well, toys that I play with sometimes. But – that said – the tech in some parts of them is still pretty damn good… I've never seen ANYTHING better at identifying questions vs. statements than the code I have running in my guys.

Anyway – wish I could help. I understand the issues, I have overcome several as best as I can – but alas, funding and real-world demands make it so I can't follow up on what I want to do, and could easily if I had the same budgets as some of these other outfits lol But good chance there's a dozen other folks out there who could make the same claim. Good thing I have gainful employment elsewhere 😊

If you ever think I can help – let me know…

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA35x4vEj_GJvFdOYNc7-4vfnjipW%2BdFv1LQDc%3DcSpTOTOA%40mail.gmail.com.

Linas Vepstas

unread,

Feb 25, 2022, 8:45:39 PM2/25/22

to opencog

On Fri, Feb 25, 2022 at 6:41 PM xanatos xanatos.com <xan...@xanatos.com> wrote:

But again, I haven't really done much in about a year, partially because I am seeing tech SOAR past what I can do on my own. It's a little disheartening.

search the term "soft programming". Basically, its about how to harness some of that whiz-bang-ness into a framework that you can treat as tinker-toy building blocks.

But yeah, individuals cannot compete against either sharply focused startups, or against the giant corporations. It takes money, time, coordination. and a vision for how to do things. When something becomes economically important, the tinkerers are left behind.

I'm tinkering with stuff that the start-ups and the big compass don't yet know how to do. I'm interested in common-sense reasoning. This leaves me in calm, placid backwaters where no one is paying attention, and the stress levels are low.

If you ever think I can help – let me know…

This is between you and Mark. He's taken an interest in modernizing the old infrastructure, and that is definitely a worthwhile task. If you think some parts can be swapped out for better parts, go for it. I'm busy with my project(s) above, and so can't really do any coding. I can act as a question-answering machine, though, and explain how it all used to work.

For the future, I'd like to see something that is modular and documented and is a collection of "tinker-toy" parts that people can assemble and re-assemble for personal projects. Despite Ameca and gpt-3 and boston dynamics, I still think there's plenty of space for tinkerers. What's missing are the tinker-tools. For example, if lego mindstorms had been open-source, wow, things could be, could have been different. Lego mindstorms was one of the great missed opportunities. Capitalism seems to fail whenever a broader common-good is needed. That's why I'm into open source.

So, then, you understand the general architecture, the general requirements. How cann all this be packaged as a kit, with a set of instructions, put-it-together and it will work type system. I think that's the goal.

I mean, its easy enough to make things work in a narrow sense. Just hard-wire everything together. It's a lot harder to make it modular, so it can be adapted for different uses.

--linas

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CH0PR20MB398072E84567CEEB6BC385F9B53F9%40CH0PR20MB3980.namprd20.prod.outlook.com.

Linas Vepstas

unread,

Feb 25, 2022, 8:50:29 PM2/25/22

to opencog

Hi Mark,

On Fri, Feb 25, 2022 at 3:39 PM Mark Wigzell <markw...@gmail.com> wrote:

I also have some errors coming from the Sophia.blend head on start up. In particular it complains of missing NOSE bone, and a number of cyclic dependencies between the bones. I haven't got any insight into using blender yet, so I'm totally on the outside ot that one.

There is a blender mailing list. You should write to it, explain what you are trying to do, and ask for help. Maybe someone will step up and volunteer to fix the bugs. I think these things are "easy" for regular blender users, and I think that perhaps this project is just sexy enough that maybe they'd want to help with it. (cc me. I can say diplomatic things.)

--linas

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CA%2Ba9A7AOzX1j6t3UVNWYoVM8m_iR1PF8Fw63YNyf5WcMcEmc8w%40mail.gmail.com.

Mark Wigzell

unread,

Feb 26, 2022, 3:23:47 PM2/26/22

to opencog

Hi Dave, if you have the impulse to help out, I'm happy to work with you.

On the subject of this thread, I was actually hoping to hear more about how low level "AGI" type of perception works. I have no idea. I would be happy to work with someone who wants to try hooking up the visual input to something that is or could become "intelligent". I see the issue of true artificial vision as being one that tries to avoid clever but non-intelligent algorithms in favour of something more organic. Surely a vision system must be trained much as a baby is trained. Indeed, surely AGI must start off like a baby? (everything is hooked up, but there is no control. movement is wild, emitted sounds are non-sensical, experienced input blends with the general awareness stemming from all feedback systems, but no "sense" is being made initially. Intelligence is present hopefully, but not manifesting rationally at this point. Or am I being completely impractical?

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA37Zx4vPsKM4KDsHYpagw%2B3SyVJjDDj3rWhRL%3DnQoLiRdQ%40mail.gmail.com.

xanatos xanatos.com

unread,

Feb 26, 2022, 4:36:04 PM2/26/22

to ope...@googlegroups.com

Hi Mark,

I am in full agreement with you on genuine AGI. An architecture needs to be created that has no "if-thens" as I call them – it has to start from scratch and build its world model in a way that allows it to record "successful" interactions with the world (such as receiving food, aka power recharges), and it needs to start from the "servos just randomly twitching" phase.

That said, my expertise has always been in tying many different functions together in python. I've managed to tie together the visual, auditory and movement functions on my robots successfully, but definitely NOT in the "neonatal" AGI framework I would have liked. I was after immediate results and short-term successes, which I achieved, but I regret not having put forth the effort to produce an "infant AGI".

That said, I have created the frameworks for such an AGI to express within. The visual acquisition system is robust, with a few hardwired functions pertaining to recognizing faces and objects; the auditory processing functions are robust and in some ways do still, despite my lagging in SO many other areas, excel beyond much else out there (with the exception perhaps of GPT-3, but my model is about 1% of the size of GPT-3 and works well at very specific parts of identifying what is being presented to it.) The auditory output is quite robust, although devoid of nonverbal utterances, which I have created in code to allow a sort of analog movement-based expression of unverbalized "statements". And I have a rudimentary facial/head/body movement function running, but nothing anywhere near close to the amazing stuff I'm seeing with Ameca (https://www.cnn.com/videos/business/2021/12/08/humanoid-robot-ameca-lon-orig-tp.cnn)

I consider having this framework in place a perfect "cradle" in which to embed the type of AGI we are both interested in, but I am still myself in "AGI Infancy". My goal was to create robots that I could interact with in a useful and significant manner (for laughs, one of my goals is to have robots that can stack my annual firewood deliveries lol)

I have other systems functioning in test-bed environments that are non-human in structure, but with very sophisticated visual systems that are – apologies – intended to drive intelligent wagons with grippers that will autonomously pick up pinecones on my property every spring. We literally get hundreds of thousands from the pine groves that line our property lol.

So no, Mark, I don't believe you're being impractical. What I believe is being impractical is the trend towards creating AGI as an "instant adult". That is what I attempted to do and as a result I have systems that are very impressive in a very narrow range of functions. So long as they are here, on my property, in the environments I have trained them on, on the tasks I have trained them on, they're pretty cool. But they'd fail horribly outside of this environment. Your vision is what will create robust AGI.

I just hope we survive as a species long enough to see these goal realized….

Let me know if any of this is useful to you,

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CA%2Ba9A7By5pp5XXYuyHnDytTrfEqS63tr%3Dgcn2CA8RFSzkOhVTg%40mail.gmail.com.

Mark Wigzell

unread,

Feb 26, 2022, 7:15:27 PM2/26/22

to opencog

Hi Dave, I like the "servos randomly twitching" vision!

It does sound like we have a similar view on how general intelligence should start. You have a cradle ready to go. I was trying to get this one in opencog/docker to run, but there is a ways to go. Assuming I can get it going, at least it will have the hookups to the atomspace. Then what? I am lacking any clear vision on how this all can go down. Seems to me, consciousness, like the big bang, the fine tuning problem, and DNA information encoding, is a miracle.

Are you planning on adding the atomspace into your "infant AGI" ?

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CH0PR20MB3980CD1EA204537380EA070CB53F9%40CH0PR20MB3980.namprd20.prod.outlook.com.

Lance White

unread,

Feb 26, 2022, 11:19:57 PM2/26/22

to ope...@googlegroups.com

Hi Dave and Mark,

Yes, I do feel that the best path is emergent. When I first saw Rodney Brooks work on Atilla I felt that this showed the path forward!

https://books.google.com.au/books?id=EfBdq9DU2FMC&pg=PA16&lpg=PA16&dq=rodney+brooks+attila&source=bl&ots=MVGsW089aH&sig=ACfU3U1NvnOiXWduINViFWsI-gwlb_nQnA&hl=en&sa=X&ved=2ahUKEwj7vt6ygZ_2AhUySmwGHX60C98Q6AF6BAhBEAM#v=onepage&q=rodney%20brooks%20attila&f=false

Goal driven with sensory input giving the feedback. From there, the trial and error produces emergent behaviour. Of course there is a world of opportunity to do that efficiently. Along with the benefit of compute power steadily making it something that could be achieved in less than glacial times.

It does seem that being humans we gravitate to the engineering approach to be solution focused rather than be the architects of the goal and the system. I guess it's inerrant to those that generally succeed in software development.

Regards,

Lnce

Virus-free. www.avast.com

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CA%2Ba9A7C1M8eh4tQFauLjXSe3b_U3Nt9bQP0o6cqFsM6Ly6SMRw%40mail.gmail.com.

--

Email Marketing will grow your business faster than any other marketing medium!
I use and recommend GetResponse: ClickHere

Virus-free. www.avast.com

Mark Wigzell

unread,

Feb 27, 2022, 12:08:52 AM2/27/22

to opencog

Hi Lance, I may have seen a weird creepy crawly robot that this guy Rodney created. Thanks for bringing it up. Yes, it seems to me too: emergent is the way to go. I need to look into that more, for sure.

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAB%2BUbUgwMFa1ObNHTAE%3DbJsQDBWnaexJs1NtAPu0m4gicz%2BzdA%40mail.gmail.com.

Mark Wigzell

unread,

Mar 2, 2022, 1:47:57 PM3/2/22

to opencog

Hi LInas, you wrote:

Unfortunately, integrating the senses together with the background knowledge is hard. It was done in an ad hoc manner, it was under-documented, hard to use, hard to understand. An adequate framework was never developed. This is not something one college student can knock out in a few weeks. The foundation for that framework is in the ros-behavior-scripting git repo. Fragments are in other places, I'd have to dig them up.

So with regards to the foundational thinking of how the AtomSpace can truly unite the sensory data with the motors, the background knowledge, what theory is governing that? I see that the AtomSpace allows linkages to be made, and it allows a common data representation/language. But what is the "glue" that causes these commonly held but algorithmically distinct islands of atomspace to coalesce?

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA35VQTZS6U_kveYXdMNS5Qfn_TxYwjVUSqH85pmOf9aF5A%40mail.gmail.com.

Linas Vepstas

unread,

Mar 6, 2022, 8:26:48 PM3/6/22

to opencog

On Wed, Mar 2, 2022 at 12:47 PM Mark Wigzell <markw...@gmail.com> wrote:

Hi LInas,

So with regards to the foundational thinking of how the AtomSpace can truly unite the sensory data with the motors, the background knowledge, what theory is governing that? I see that the AtomSpace allows linkages to be made, and it allows a common data representation/language. But what is the "glue" that causes these commonly held but algorithmically distinct islands of atomspace to coalesce?

I'm not sure how to answer that question. At the lowest level of technical detail, the answer is "any way you want". Sounds flippant, I suppose. The AtomSpace was designed so that you can do things however you want, using whatever data representation you want, and still get decent performance.

At the medium-level, there is that issue that Dave Xanatos has mentioned, has faced: how do you glue together multiple, disparate systems into a functional whole? The answer is "very carefully", and, as everyone who has ever tried this eventually learns, the result is complex and fragile and not general. Certainly, lots of lone-wolf developers have tried this path, as have many, many universities and corporations. It's very much the mainstream-thinking path. If you throw 50 or 100 or 500 developers at it, you can actually do OK: you get stuff like Siri or Alexa, or self-driving cars or assorted autonomous military technologies. What the heck: Hanson Robotics Sophia had it's run-in-the-sun following this design mindset. So, sure, it can be scaled, at least a little bit.

Some number of years ago, I decided that this architecture, of carefully hand-crafted subsystems carefully assembled by technicians into a simulacrum of a human being, that this is not really the correct approach to AGI. So I'm working on something else. I'm trying to figure out how to perceive structure in raw environmental data. This broadly encompasses audio, video, speech, blueprints, drawings, language, astronomical telescope data, disassembled binaries of computer viruses, economic data, social graphs, whatever. "Data" in the large. In reality, I've only been able to take a few minor steps, in limited domains. I can clearly see the next several steps ahead. They are described in greater detail in one short ten-page paper: https://github.com/opencog/learn/blob/master/learn-lang-diary/agi-2022/grammar-induction.pdf with longer and more detailed descritions scattered about here and there.