Beyond the Marshmallow test

Patrick Hammer

unread,

Mar 6, 2021, 1:26:35 PM3/6/21

to open-nars

Hi everyone!

A recent 2020 paper "How intelligent is a cephalopod? Lessons from comparative cognition" discusses the intelligence of cephalopod: https://onlinelibrary.wiley.com/doi/pdfdirect/10.1111/brv.12651

It's a great paper I recommend anyone with interest in Animal Cognition to read, though one aspect bugs me and catched my eye immediately, in Table 1: "Self-control - The ability to delay gratification by resisting the temptation of an immediate reward in preference for a better but delayed reward", a view which, as I found, is quite widespread in psychology and hence not limited to this paper.

Clearly, for us in the AGI field self-control is so much more than this narrow ability to delay gratification. The presence of this ability is especially, as I will now show, fundamentally insufficient to determine whether a system has the self-related mechanisms similar to the ones we described in "Self in NARS, an AGI System" - https://www.frontiersin.org/articles/10.3389/frobt.2018.00020/full

as many AI systems can delay gratification successfully even without these mechanisms!

To show this, we will run two such systems on the Standford Marshmallow test, a common way to measure ability do delay gratification:

The subject (which is assumed to like marshmallows, and to like two of them more than one) gets a marshmallow. If it resists to eat it, it will get a second one after some time and can eat both. Either from experiencing a few examples runs, or by knowledge transfer (speech/text) of this fact, the subject is then expected to refrain from eating the first marshmallow so that it will get two marshmallows to eat instead.

First I tried a Q-Learner with Eligibility Traces, a purely reactive Behaviorism-based approach which is very mainstream. Due to its way to handle temporal credit assignment and ability to balance near- and long-term reward, it had no issue with this task, as expected. To replicate (on any Linux/Mac/UNIX/Android via Termux shell):

git clone https://github.com/opennars/OpenNARS-for-Applications

cd OpenNARS-for-Applications

git checkout QLearner

./build.sh

./NAR shell < ./examples/nal/marshmallow.nal | python3 colorize.py

Output: QLearner.png, see attachment

File: https://github.com/opennars/OpenNARS-for-Applications/blob/QLearner/examples/nal/marshmallow.nal

Then I tried `OpenNARS for Applications` (ONA), a simplified NARS which in this experiment is fully restricted to temporal and procedural reasoning&learning (NAL-7, NAL-8, as in the book "Non-Axiomatic Logic" https://www.worldscientific.com/worldscibooks/10.1142/8665 ). In this setup it lacks all of the SELF-related mechanisms described in "Self in NARS, an AGI System" which we see as so central for higher-level cognitive functioning we ultimately want to understand to a replicable degree. Again, it had no issue with this task, as expected. To re-run the experiment:

git clone https://github.com/opennars/OpenNARS-for-Applications

cd OpenNARS-for-Applications

./build.sh

./NAR shell < ./examples/nal/marshmallow.nal | python3 colorize.py

Output: ONA.png, see attachment

File: https://github.com/opennars/OpenNARS-for-Applications/blob/master/examples/nal/marshmallow.nal

Please see the attached info.txt for how to interpret the example file's content!

This post is for discussion and also serves as a courtesy for researchers in other fields (especially Cognitive Science), who are interested in exploring the limits of what the Marshmallow experiment can show, and who want to explore a broader notion of self-control. This will potentially lead them to finding better experiments that allow to demonstrate higher-level cognitive functioning (especially self-related mechanisms such as introspective reasoning ability) in cephalopods and many others, beyond the simple ability to delay gratification which is easy even for current AI.

Best regards,

Patrick

QLearner.png

info.txt

ONA.png

Robert Johansson

unread,

Mar 8, 2021, 11:16:20 AM3/8/21

to open...@googlegroups.com

Hi Patrick!

Thanks for a very nice post - and a very nice demonstration! :)

I totally agree with you that self-control is much more than delayed gratification.

Actually, I think this is a nice example of something along the lines of executive function. From the perspective of NARS as a unified model it's a nice example of how something like executive function follows from design, rather than being implemented as a separate function.

One thing that got me thinking from your example is what kind of self-control/executive function behavior that is possible for non-human animals, and what is only possible for humans. Classic example of something of the latter is the Stroop task, or the Wisconsin card sorting test. Both would be very interesting to try out with ONA! The latter involves the subject to learn a sorting strategy and then without warning in the middle of the task, change the strategy.

From the perspective of RFT, all of this are examples of rule-governed behavior. This theoretical framework I think is a very fruitful approach to talk about "verbal rules" in the same spirit as NARS. I have a book chapter on executive function and RFT that I'm happy to share with anyone interested.

Thanks and best wishes to all of you :)

Robert

--
You received this message because you are subscribed to the Google Groups "open-nars" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-nars+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/open-nars/c8064bf0-bc9e-4930-8016-062eda19c969n%40googlegroups.com.

robe...@googlemail.com

unread,

Mar 8, 2021, 12:24:15 PM3/8/21

to open...@googlegroups.com

Misusing frequency as a reward for utility maximization (which this test is testing) is a nice hack but not the way to go due to the following reasons:

* stating an ordering is impossible (ex: a>b, b>c etc. (For EDT and variants)

* frequency of events should always be 1.0 due to simpler justification/interpretation (else the resulting freq of the predictive implication is a mix of how often it was the case and the freq of the input event, which isn't justifiable and which In can't interpret as something "pure")

* makes it hard to build upon due to a certain lack of "modularity"

* a proto AGI (NAR(S)) should accept any decision making theory in principle on top of the "base" decision theory from the nars theory, such as EDT, TDT, FDT etc.

To view this discussion on the web visit https://groups.google.com/d/msgid/open-nars/CADxbj-P9V%3DXYGycLhJe_x762tmWrD-D2_3EfRfO_%2BnfDWbDuhA%40mail.gmail.com.

robe...@googlemail.com

unread,

Mar 8, 2021, 12:41:15 PM3/8/21

to open...@googlegroups.com

A order

should be stated with a relation

// GTU : greater than utility relation
// transititvity: ex: A > B B > C ==> A > C
<(<($a*#b) --> GTU>&&<(#b*$c) --> GTU>) ==> <($a*$c) --> GTU>>. {1.0 0.998}

<(B*A) --> GTU>. {1.0 0.998}
<(B*A) --> GTU>?

decision has to depend on this order like in EDT:

// execute op if choice happend and if we prefer choice

<((#Choice && <(#Choice*#2) --> GTU>), <({SELF}*#Choice)-->^pick>) =/> G>.

G! :|:

so this isn't choosing B over A because the exp() is greater because of a frequency hack, it's choosing it because it has more utility by the ordered relation.

Patrick Hammer

unread,

Mar 8, 2021, 2:34:16 PM3/8/21

to open...@googlegroups.com

Hi Robert J.!

"I totally agree with you that self-control is much more than delayed gratification."

That's great to hear from someone in your field! Unexplicit oversimplification, confirmation bias and overpromising can all ruin otherwise valuable research, and it's not only in studies of animal intelligence.

In AI for instance, I heard the phrase "Deep Learning works like the brain does" many times, in DL lectures but also stated by the most famous DL researchers, in this case it's not only oversimplification even but also scientifically dishonest, I think you would agree.

I guess the takeaway is that sometimes we want to interpret more into our models and results than they are, or more than they are able to show... :)

"Actually, I think this is a nice example of something along the lines of executive function. From the perspective of NARS as a unified model it's a nice example of how something like executive function follows from design, rather than being implemented as a separate function."

I agree, the decision making-model should fully capture it. If an extra module is required for different problems, one quickly slips into the narrow-AI trap, leading to systems which are hard to combine and which can only fulfill one particular purpose each.

"One thing that got me thinking from your example is what kind of self-control/executive function behavior that is possible for non-human animals, and what is only possible for humans. Classic example of something of the latter is the Stroop task, or the Wisconsin card sorting test. Both would be very interesting to try out with ONA! The latter involves the subject to learn a sorting strategy and then without warning in the middle of the task, change the strategy."

Interesting, I will look into these experiments soon! I also played with Wason's cards recently, this was fun and surprisingly consistent with outcomes with human subjects.

"From the perspective of RFT, all of this are examples of rule-governed behavior. This theoretical framework I think is a very fruitful approach to talk about "verbal rules" in the same spirit as NARS. I have a book chapter on executive function and RFT that I'm happy to share with anyone interested."

Cool, this would be great to read for sure!

------------------------------------------

Hi Robert W.!

You raised many points, thank you, here my input on 2 of them for now:

- Frequency of events represents desirability for goals events, and also for belief events it doesn't need to be 1. It represents w+/w for the event statement as usual, consistent with the NAL book. For belief events, consider the case of two inconsistent sensor readings happening close in time, the revised w+/w will not be 1 but somewhere between 0 and 1 dependent on the confidences of the readings. Or do you mean why I used the same notation in the QLearner branch that works based on utility maximization? It's because I didn't have time to change the example file parser.

- Regarding GTU relation: That's valid, but order can be expressed with sequences just fine, that's what they are designed for. And differently than the GTU relation, they are automatically introduced by the system.

Best regards,

Patrick

To view this discussion on the web visit https://groups.google.com/d/msgid/open-nars/CANYcpZzbD8WNodUDi90%2BctXLF8T8MnvpSVnJ7ew5Mv3-SBaO_w%40mail.gmail.com.

robe...@googlemail.com

unread,

Mar 8, 2021, 3:36:28 PM3/8/21

to open...@googlegroups.com

>- Regarding GTU relation: That's valid, but order can be expressed with sequences just fine, that's what they are designed for. And differently than the GTU relation, they are automatically introduced by the system.

I of course mean an order of preference, like in the relation which is motivated by EDT (but without probability and so on). Ex: I prefer to have 20$ over having 10$. I prefer to have 30$ over having 20$, so I prefer 30$ over 10$.

This is different than temporal order.

>Frequency of events represents desirability for goals events, and also for belief events it doesn't need to be 1. It represents w+/w for the event statement as usual, consistent with the NAL book.

I would argue that it needs to be 1.0 or 0.0 for input events and w+/w for the learned pred impl. (at least in my NAL for sensorimotor).

I don't see how an input frequency between 0.0 and 1.0 does make any sense. Did the event happen in a superimposed state between it happened and it didn't happen? I can't attach any meaning to it for input events.

I had to modify Pei's NAL in my implementation/"theory" for exactly these reasons.

>For belief events, consider the case of two inconsistent sensor readings happening close in time, the revised w+/w will not be 1 but somewhere between 0 and 1 dependent on the confidences of the readings.

That's inviting all sorts of weird problems.

Mixing an input event freq between 0.0 and 1.0 with w+/w for pred impl (where w+ and w should be integers) isn't wise at all.

To view this discussion on the web visit https://groups.google.com/d/msgid/open-nars/CA%2BK1R6Vrmk8VX5ctAZw0rgD6wEWvJy-GvyJioO-WFWRaaZVQjg%40mail.gmail.com.

Patrick Hammer

unread,

Mar 8, 2021, 4:16:21 PM3/8/21

to open...@googlegroups.com

Hi Robert W.!

"I of course mean an order of preference, like in the relation which is motivated by EDT (but without probability and so on). Ex: I prefer to have 20$ over having 10$. I prefer to have 30$ over having 20$, so I prefer 30$ over 10$.

This is different than temporal order."

I agree but this is captured by the implicit order between a goal and another goal that is desired more, that's what desire value is for. We don't desire/not desire, but desire to different degrees!

But of course letting it reason about explicit order relationships is fine too.

"I don't see how an input frequency between 0.0 and 1.0 does make any sense."

Different desire values is exactly how it makes sense for goals I would argue. And for beliefs the inconsistent sensor readings are a case too, but even for a single sensor value, consider different brightness levels! Sure some could be discretized into different terms which have each a range, this would be like color dimensions: physically, colors don't exist, light is just within a 1D frequency spectrum, but we have different receptors for different value ranges.

Another case is when predicted evidence is considered in addition to current, though this part is of course tricky to control, let's better leave prediction out of the discussion here, it would complicate the much simpler issue at heart.

"Mixing an input event freq between 0.0 and 1.0 with w+/w for pred impl (where w+ and w should be integers) isn't wise at all."

It's not an issue for ONA at all, though I agree it can make a NARS implementation simpler, though not without losing ability.

I'll wait to see what others have to say here, maybe someone else has better examples for frequencies between 0 and 1! :)

Best regards,

Patrick

To view this discussion on the web visit https://groups.google.com/d/msgid/open-nars/CANYcpZw%3DocLYx-82qgLrOfaPZZRr0o8qYroE8TD8ZveeQtxOmw%40mail.gmail.com.

robe...@googlemail.com

unread,

Mar 8, 2021, 5:13:26 PM3/8/21

to open...@googlegroups.com

>though I agree it can make a NARS implementation simpler, though not without losing ability.

There is no "lost ability", because this freq confusion isn't that useful to me.

>I agree but this is captured by the implicit order between a goal and another goal that is desired more, that's what desire value is for.

Desirability serves a "low level" purpose (low level as regarding to decision making under AIKR). This doesn't however help much for explicit preference (I prefer being in Dresden over being in X, I prefer having more money, thus I prefer being in Dresden with as much money as possible).

Expressing this with freq is hopeless because an ordering can't be expressed.

It would also mean that a failed anticipation would mess with the "ordering" only expressed with desirability(exp()), no one wants that.

To view this discussion on the web visit https://groups.google.com/d/msgid/open-nars/CA%2BK1R6XBvQ00SZGj8COGpwL%3D5MyDEs%2B_-oJxmXxz%3Dr7QXBMtzg%40mail.gmail.com.

Patrick Hammer

unread,

Mar 9, 2021, 7:04:01 AM3/9/21

to open-nars

Hi Robert!

"There is no "lost ability", because this freq confusion isn't that useful to me."

The ability that would be lost is to support sensors with continuous input value ranges, and of course to revise unreliable sensor readings.

And for derived events you would agree that freq 0-1 needs to be supported as the beliefs they are derived from are often between 0 and 1 (like an implication which doesn't predict 100% successfully, which is more the norm rather than the exception), so why not also allow it for input as well? I guess it's because your anticipation mechanism for events assumes "happened"/"not happened" is the only outcome and would need to be extended to "happened to a matter of degree as the evidence indicates", am I right? You also have to see the broader picture here: yes so far our implementations are mostly restricted or at least strongly biased to forming implications between input events (except of v3.1.0), but this shouldn't stay this way in the long-run, and for derived events, the anticipation mechanism will need to cope with frequencies between 0 and 1 like ONA's and OpenNARS's is capable of anyway.

"Desirability serves a "low level" purpose (low level as regarding to decision making under AIKR). This doesn't however help much for explicit preference (I prefer being in Dresden over being in X, I prefer having more money, thus I prefer being in Dresden with as much money as possible). Expressing this with freq is hopeless because an ordering can't be expressed."

There could be mental operators for making this relationship explicit if needed like we had in OpenNARS, but it rarely is required.

The implicit order given by two goals with different desire value is sufficient to make NARS prefer certain decisions over another, and this is usually what really matters.

Desire value of goals is fully taken into account here and once again cannot be reduced to a binary value.

Best regards,

Patrick

robe...@googlemail.com

unread,

Mar 9, 2021, 11:05:52 AM3/9/21

to open...@googlegroups.com

Confidence should indicate how reliable a sensor reading is.

To view this discussion on the web visit https://groups.google.com/d/msgid/open-nars/bb57ac46-2ed8-4e05-8bdf-35eb231e7a9fn%40googlegroups.com.

Reply all

Reply to author

Forward