Data and bias: privilege and the availability heuristic

275 views

Skip to first unread message

Greg Borenstein

unread,

Nov 14, 2013, 5:02:41 PM11/14/13

to philosophy-in-a-...@googlegroups.com

Hi all,

This post comes out of some work I'm doing at the moment in the field of interactive machine learning. However, I think the basic issues might be of wide interest to the readers of this list. It's not exactly a "philosophy" topic, but it's right at the nexus of psychology and human computer interaction in a way that I think fits the wider "code + humanities" theme of this list.

A couple year back when it first came out, I read Daniel Kahneman's Thinking Fast and Slow:

http://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555

If you haven't seen it, Kahneman is a nobel-prize winning psychologist whose work on judgment and decision making (with his research partner, Amon Tversky) is the basis of behavioral economics and a bunch of other related fields. Fast and Slow is aimed at a popular audience. It's great. Gives you a real feeling for the field without being condescending or overly technical.

One of Kahneman and Tversky's classic papers is "Judgment under Uncertainty: Heuristics and Biases" from 1974: http://psiexp.ss.uci.edu/research/teaching/Tversky_Kahneman_1974.pdf

In that paper, they outline categories of judgment at which people are particularly poor: specifically judging the probability of uncertain events and estimating the value of uncertain quantities. They argue that, when faced with these tasks, people instead substitute a simpler judgment using heuristics. They discuss a series of heuristics and the biases that they lead to in the paper.

One of these heuristics is "availability". It works like this: when asked to asses the frequency of a class or the probability of an event, people do so by judging how difficult it is for them to think of examples. The harder it is to think of examples, the less-likely they judge the probability or the lower they judge the frequency.

In an experiment, they asked people whether a random word from an English text is more likely to have 'r' as its first letter or as its third letter (assuming all the words were longer than three letters). People approach the task by trying to recall words that start with 'r' (like "road") and that have 'r' as the third letter (like "car"). It's much easier to recall words by their first letter than by their third so people always say that there are more words that start with the letter than have it as the third letter regardless of underlying fact (consonants like 'r' and 'k' are more frequent in the third position than the first).

Two points about this:

1) This particular example of the availability heuristic made me think really strongly about privilege, especially in the geek community. Let's assume that, like all the subjects in the many many experiments that have confirmed the existence of the availability heuristic and its role in many kinds of bias over the years, that we geeks also use it. That means that when you're trying to estimate the impact and importance of something like gender or racial discrimination or instances of sexual assault or harassment, what you actually do is think of how many times you've seen or heard about instances of these things. But, here's the thing, if your demographic category is less-likely than average to suffer from these things (i.e. white hetero male), it's probably harder for you to think of these instances and therefore you probably underestimate their frequency.

Also! Because there's a lot of stigma around these issues, information about their occurrence tends not to spread. This makes the problem worse: instances of these issues can be happening all around you without you seeing them, depriving you of examples to call to mind when asked to estimate how commonplace these problems are. So, the availability heuristic means that being in an under-afffected sub-group means you constantly underestimate the frequency of poorly-visible problems like these, which is the essence of privilege. Further this analysis argues in favor of the value of talking publicly about these problems (and supporting those who do) not just in order to create macro-level political change, but because they will improve your own judgment by making the number of examples you can call to mind more closely resemble the real scope of the problem.

2) More generally if you work with data, and specifically presenting it to people, and you're not taking these kinds of biases into account, what are you doing? For an egregious example, take confirmation bias <http://en.wikipedia.org/wiki/Confirmation_bias>. This bias shows that people are more likely to favor information that supports their beliefs. Some of the examples of this in Kahneman's book are extraordinary, like psychology grad students who are currently studying experimental procedure asserting that they and their family members would be less likely to display egregious behavior that is normal in some Milgram-style experiments -- everyone thinks they're the exception to the rule). If you think your dataviz ever changes anyone's mind you've got to look closely into this stuff.

*3) (BONUS POINT) Maybe you're curious about what all of this has to do with machine learning? In summary, what I'm looking at goes like this: to train many machine learning systems we ask users to make a series of judgments. In supervised learning we ask them to create labels for example data. In reinforcement learning we ask them to provide feedback to a learning agent. In all of these scenarios, the users' judgments are probably affected by various kinds of bias. If we take these into account we might be able to improve the quality of our machine learning. Specifically, I'm currently looking at the effect of anchoring bias on learning in recommendation systems. You can read my research proposal here: <http://urbanhonking.com/ideasfordozens/2013/11/01/research-proposal-accounting-for-anchoring-bias-on-user-labeling-in-machine-learning-systems/>

Ok, sorry for the long rant, but hope it's interesting!

yours,

Greg

Reply all

Reply to author

Forward

0 new messages