Automated positive reinforcement

113 views
Skip to first unread message

Lincoln Quirk

unread,
Oct 11, 2013, 2:42:19 PM10/11/13
to akra...@googlegroups.com
Hello,

I've been thinking of a project to automatically reinforce positive behaviors.

The basic idea is to have a candy dispenser next to my computer, plugged into the network. When I do something positive -- check a bug off in the bug tracker, complete a to-do item, get to inbox zero, etc. -- the candy dispenser automatically ejects a candy, making a pleasing noise and giving a small reward.

Anyway, anyone tried anything like this? Any results, positive or negative?



Further thoughts:

- Daniel Reeves inspired this with his CFAR testimonial (http://rationality.org/testimonials/) and so I emailed him about it, and he suggested two important ideas: 1) that we don't necessarily need to build the hardware, if we can just convince our computer/smartphone to make a pleasant noise; and 2) that we could probably increase the effectiveness of the system by using intermittent rewards. (He also suggested I join & email this list which is why I'm here now.)

- Another instance of something like this is by Kathryn McElroy: http://cargocollective.com/kathrynmcelroy/Edible-Email-Notifier - I have asked her if she had any results but haven't heard back.

- The latency of this system is probably important - it seems like it would substantially decrease the effectiveness of the reinforcement, if my reward was delivered more than 5-10 seconds after the event. Which means the technology stack becomes a bit trickier to implement.

Daniel Reeves

unread,
Oct 11, 2013, 2:57:06 PM10/11/13
to akratics
Clarification: I do think one part of the hardware is necessary,
namely, the actual jellybeans. :) I have the hypothesis (untested)
that it may suffice to use the honor system. You send or archive the
email or move the trello card or whatever and a pleasant pavlovian
bell chimes and you may take one jellybean. If you counted the total
jellybeans you started with you could set up a separate commitment
device to make sure it matched in the end.
> --
> You received this message because you are subscribed to the Google Groups
> "Akratics Anonymous" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to akratics+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



--
http://dreev.es -- search://"Daniel Reeves"
Goal tracking + Commitment contracts == http://beeminder.com

Jake Hofman

unread,
Oct 11, 2013, 4:07:39 PM10/11/13
to akra...@googlegroups.com
if you do get around to the automated solution, an arduino + solenoid
or actuator is nice solution. i have code to interface that with a
website, so that submitting a webpage form triggers the actuator.
happy to share if it'd be helpful.

for the hardware side:

http://www.instructables.com/id/Controlling-solenoids-with-arduino/?ALLSTEPS
http://bildr.org/2011/03/high-power-control-with-arduino-and-tip120/
http://itp.nyu.edu/physcomp/Tutorials/HighCurrentLoads

Paul Fenwick

unread,
Oct 11, 2013, 8:08:37 PM10/11/13
to akra...@googlegroups.com
> Anyway, anyone tried anything like this? Any results, positive or negative?

My exobrain¹ gives me HabitRPG XP for responding to email, flossing my
teeth (via a beeminder callback), recordinging my weight, and other
bits and pieces. However the novelty value for that wore off pretty
quickly. Jellybeans might indeed work better, as they're less abstract
and more delicious. :)

¹ https://github.com/pjf/exobrain/

Daniel Reeves

unread,
Oct 11, 2013, 9:08:13 PM10/11/13
to akratics
Thanks Jake and Paul! Does anyone know a concise recommendation for a
randomized reward schedule? (Eg, when the desired behavior is
exhibited, dispense a jellybean with probability 1/3. Or maybe it
shouldn't be stateless like that and should, say, limit dry spells of
jellybeanlessness.)

Perhaps the answer is buried in here: http://en.wikipedia.org/wiki/Reinforcement
> --
> You received this message because you are subscribed to the Google Groups "Akratics Anonymous" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to akratics+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



Brent Yorgey

unread,
Oct 11, 2013, 9:17:47 PM10/11/13
to akra...@googlegroups.com
I don't know about the optimal rate, but whatever it is, I would use a Poisson distribution --- that way, very occasionally you will get multiple jellybeans!

-Brent

Lincoln Quirk

unread,
Oct 11, 2013, 11:15:07 PM10/11/13
to akra...@googlegroups.com
Okie. I mocked this up with SMS and IFTTT and Google Spreadsheets:

1. Created a spreadsheet with 3 columns: "Name", "Done?" and "Date Completed"
2. Attached the below script to the "onEdit" trigger of the spreadsheet. (This is currently tricky but if Google approves my attempt to publish the script then maybe it's easy? Let me know if you can't figure it out.)
3. Setup an IFTTT trigger to send me an SMS to "get yourself a candy" for incoming email with #todocomplete in the subject.
4. Put some dark M&Ms in my fridge.
5. Added to-dos to the list, and checked some off by typing a 'y' in the Done column.

I'll update you all in a few days and let you know the result. I'm already feeling positive anticipation about checking things off the list, so I'm optimistic :)


Here's the script.

function onEdit(e)
{

  if (e.range.getColumn() == 2 && !e.range.isBlank())
  {
    
    // Mark in column 2 the current date
    var r2 = e.range.offset(0, 1);
    r2.setValue(new Date());

    // With some probability, send the todocomplete IFTTT trigger    
    var r0 = e.range.offset(0, -1);
    if (Math.random() < 0.5)
    {
      Logger.log("Sending reward");
      MailApp.sendEmail("tri...@ifttt.com", "#todocomplete " + r0.getValue(), "");
    }
    else
    {
      Logger.log("Unlucky, no reward");
    }
  }    
}

Michael J.J. Tiffany

unread,
Oct 12, 2013, 9:22:20 AM10/12/13
to akratics
+1 to using a random reinforcement schedule.

On reenforcement probability: the state of the art in operant conditioning is probably still in dog training. Can we learn anything from dog trainers? We don't see a huge amount of scientific rigor, but we do see strong selection pressure among the population (people who stay in dog training are the ones who produce good output -- trained dogs -- in the least amount of time, else they lose to those who do). Polling from this population as well as I can, I've derived a consensus figure of just 1/5 for reinforcement of basic behaviors (e.g. sitting on command).

On reward latency: hacking some deep brain structures can work on surprisingly long timescales. I don't really believe in deep brain structures, but some stimuli (e.g., "this food made me feel poisoned!") are more potent than others. Recall the long-ago work on induced food aversions in rats and dogs with radiation coming *hours* after the fact! (see the Taste Aversions part of http://psychology.about.com/od/behavioralpsychology/a/classcond.htm for an easy overview). That insight is not immediately actionable in your first use case, but it's worth keeping in mind for future experimentation, I think.

I mean, an entire generation of intellectuals was conditioned to enjoy avant-garde theater, which I think can only be explained by the sex they must have been having afterward.


Cheers,

Michael Tiffany


Erica Edelman

unread,
Oct 16, 2013, 7:59:00 PM10/16/13
to akra...@googlegroups.com

Relevant, and possibly of interest- tl:dr- Percentile reinforcement schedules: rather than wait for most responses to meet criterion and then drastically reducing reinforcement frequency by shifting criteria infrequently, it is better to change criteria frequently to maintain both a relatively constant reinforcement density and an intermittent one.


SOURCES

Galbicka, Gregory. Shaping in the 21st Century: Moving Percentile Schedules into Applied Settings. n.d. (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1297861/pdf/jaba00010-0182.pdf) (p740-754)


...also, Fisher, Wayne W., Cathleen C. Piazza, and Henry S. Roane. Handbook of Applied Behavior Analysis. New York: Guilford Press, 2011. p240ish



OVERVIEW


(Shaping) used to be considered more an art than science. Now: “percentile schedules” involving rules for when to deliver reinforcers, and these momentary rules adjust based on recent (local) rates, duration, or types of responding. (ABA p240)


percentile schedules represent a formalization of the rules of shaping. (p755)


Percentile schedules, however, do more than automate shaping. In addition, they make explicit and objective the criteria that define responses as criterional or noncriterional throughout acquisition and maintenance, providing explicit prior control over reinforcement density as well as criterional response probability. Because of this, they provide almost complete independence from trainer- and subject-related variables. This allows all subjects to be trained in a specified manner despite changes in the trainer or the subject, or at different points in the differentiation. (p740)


(If you shaped by occasionally raising criterion for reinforcement) A plot of reinforcement density across time would reveal a pattern like a sawtooth; with each change in the criterion, reinforcement density drops abruptly, but as behavior gradually changes to include more and more criterional responses, reinforcement density gradually increases until the cydle repeats with the next criterion change. This cyclic change in reinforcement density is more pronounced following extended training (p243)


rather than wait for most responses to meet criterion and then drastically reducing reinforcement frequency by shifting criteria infrequently, it is better to change criteria frequently to maintain both a relatively constant reinforcement density and an intermittent one. Both characteristics decrease the likelihood of losing control

over responding prior to the acquisition of the terminal response. (p244)


The percentile solution, developed and expanded by Platt (1973) and colleagues, is momentarily to abandon

the exact physical characteristics of the response and treat it as an ordinal quantity. Ordinal quantities

are values that carry only an associated rank, as opposed to the more typical means of quantifying

observations by assigning a cardinal number and a standard unit. (p244)

NITTY GRITTY OF HOW TO DO IT


m previous observations create m + 1 intervals, one of which must contain the next observation. The counterintuitive notion that intervals of different sizes are equally likely to contain the next observation arises because the line represents a cardinal scale, but the question of which interval will contain the next observation relates to the ordinal properties of the observations. For the moment, ignore the fact that there are physical values attached to any of these observations, and treat them solely in terms of their ranks. In any distribution of values, there is one and only one value ranked 1st, 2nd, 3rd, and so forth. The question of interest is not "What is the expected value of the next observation (i.e., what distance will next be run)?" but rather is "Where will the next observation rank?" If the assumption of independence is met, it will be as likely to rank first or last or anywhere in between, depending on the number of prior observations. (p745)


Hence, the probability that the next observation will fall into any one of k intervals defined by m observations is k times the probability of falling into each interval, or k/(m+ 1). ...establishing a criterion at the kth rank. That is, rather than setting the criterion (for reinforcement) at a particular fixed, physical value, the criterion can specify that the next observation, to meet criterion, must rank higher than the value currently ranked k. When k = 1, responses will be considered criterional if they exceed the response currently ranked 1st (lowest).... The probability of a criterional response (denoted w) is ….w = 1 - [k/(m + 1)] . …. Thus, as the criterion is made more stringent (i.e., as k is increased), the probability of observing a criterional response decreases accordingly, as intuition would suggest. (p746)


(If you know you want w to be a set percentage of reinforcement, you can rewrite equation to find k.) (p746)


Instead of comparing current response to all previous responses (increasing m by one each iteration), use only the most recent responses to compare to. For example, only use the past 5 responses. (p747)


For ties- when current response is tied with response it must exceed: “The simplest solution is to select ties with a random probability equal to w and call them criterional.” (p749)


Percentile schedules appear to meet all the requirements for a viable procedure to formalize shaping except the last-they do not specify a terminal response. The criterion is never specified as an absolute; rather, it is described only in relative fashion (i.e., exceed the kth rank)...There is only one terminal response of all shaping-to do better on the next trial than on previous trials. This is what percentile schedules program, where "better" is defined as exceeding the kth rank and "previous trials" is given by the most recent m observations. Because criteria are evaluated relative to ongoing behavior, there is never a need to stop shaping (p750)


Although sequential dependencies (e.g. responses such as  1, 2, 3, 4, 1, 2, 3, 4, 1...etc) diminish the ability of percentile schedules to control criterional response probability, their effects can be minimized by increasing the comparison distribution size. (p753)


The other "limitation," that responding be ordinally rankable, could actually aid application of

percentile schedules...To illustrate, suppose we wish to train a developmentally disabled client to drink fluid though a straw. Prior observation of the behavior leads the shaper to suggest that the following five behaviors

might be involved: (1) holds glass, (2) directs glass toward mouth, (3) holds straw with other hand, (4) directs straw into mouth, and (5) sucks on straw. These five behaviors can easily be ranked 1 to 5, with 1 being furthest from the terminal response and 5 being dosest. A percentile schedule could be

imposed by recording the response value (i.e., 1 through 5) on each trial. Whether our conception of the response matches the subject's will be evident in the relative frequency of each of the different rankings. (So steps can be added or taken away depending on the subject’s responses). (p754)


Lincoln Quirk

unread,
Oct 20, 2013, 8:05:47 PM10/20/13
to akra...@googlegroups.com
Thanks Erica! I like the percentile solution but it seems to only apply in cases where you have events with an ordinal value (like how "close" you are to the goal), not just whether or not you did the thing. Now I'm brainstorming ways to make my to-do list events ordinal...

Anyway, I did build a cardboard prototype of the device. 


Further data still seems to indicate that this project is valuable -- I'm still using the to-do list system. But it's still inconclusive, so expect another update in a few weeks or maybe a couple months.
Reply all
Reply to author
Forward
0 new messages