Summary:
Goal: predict people’s response to incentive structures
Want to incentivize some desired behavior in a population (e.g. reduce car use, ensure people take their medicine)
Given some budget for incentivizing a goal, how to do this best?
Many approaches:
Loss aversion: give people money at start of month, take away when they don’t do desired thing
Lotteries
Accounting for diminishing sensitivity: people more sensitive to $1 difference between 10 and 11 than to 99 and 100
Can experiment with many different types of treatments: effective but expensive
Can ask experts but they disagree and are often wrong
Want to create an ML model that predicts human responses to stimuli
Approach: model competitions using human training data (e.g. via Kaggle)
Given new choice problem: decisions under risk and uncertainty
Goal :predict human response to choice problem
Baseline models provided
Researcher submit solutions/model
Problem: choice under risk and uncertainty
Choices: A or B
A: 3 with certainty
B: 4 with prob .8. 0 with prob .2
Can expand to lotteries, more outcomes, unknown probabilities, etc.
Experiments:
720 choice tasks (210 train, 60 test)
930 participants, ~700k choices
Accuracy metrics:
mean squared error
Completeness: proportion of predictable error the model error
(errorbase - errormodel) / (errorbase - errorperfect_model)
46 teams, 20 submissions
Best: BEAST-GB: Beast Gradient Boosting
Beast theory: people use strategies that are useful in many situations, then apply them to similar situations
Unknown:
What similarity function do people use?
What strategies do they have?
Model:
Input:
Risky choice task is given to the BEAST theory model to predict what the theory implies for the decision
BEAST is used as a foundation model
Task payoffs and probabilities
Train Extreme Gradient boosting model
Output: prediction of human choice
All submissions combined theory and ML
Prior work has shown that an unconstrained neural network works better than pure BEAST
BEAST-GB beats both BEAST and neural nets: 96.2% complete. High accuracy even when training on a few records
Ablation studies show that psychological, BEAST and payoff features are key when used by ML model
BEAST helps the ML generalize across experiments
BEAST-GB is more accurate than other approaches
Also, slightly more accurate than using other experiments with same payoffs but different UIs
That baseline captures the baseline variability of the experimental process
BEAST-GB is more accurate because it captures the mean of the experimental noise process, while real experiments are sampled from a biased distribution of study designs/populations/UIs
Next: larger experimental studies with more verbal cues to people
LLMs are showing some capability for predicting behavior but the results are still inconsistent