questionable validity of results if words are presented out of context

Aug 29, 2008, 5:17:23 PM8/29/08
While testing reCAPTCHA, I have encountered blobs that I couldn't

In one case, I knew the first and last letter of the four letter word,
but I couldn't definitively say what the middle two letters were. In
another case, I got blob that was either spilled/spashed ink or one

In yet another case, I got four ambiguous vertical lines that couldn't
form a word I knew (first long, next three were short), but with a
little imagination, could be anywhere from four letters to two:
"liii" (with the dots missing from the i), "bn" (with a blank vertical
stripe through each character), "lin", etc..

In all of these cases, the blobs cannot be reliably resolved without
knowing their context. I also believe that there are only a finite
number of plausible interpretations. Due to the limited number of
plausible interpretations, a situation where multiple people guess the
same "word" is possible. To worsen the situation, the help for the
CAPTCHA encourages people to guess: "If you are not sure what the
words are, either enter your best guess or click the reload button
next to the distorted words." Given enough ambiguous blobs and enough
guesses, this situation becomes likely.

I'm assume that such a scenario would lead the reCAPTCHA to believe
that it now knows the word. (Please correct me if I am wrong here; I
do not know all of the details of this system.) However, its
knowledge would merely be the consensus of ill-informed guesses.

To remedy this situation, I propose the following solutions:
- Allow the user to see the context of the word. (This will probably
be the most effective solution.) This would allow the user to make a
more educated guess.
- Employ some method to record the reliability of the user's answers
and record alternative answers. The reliability might be determined
by considering the following factors (among others):
o have the user rate the confidence of his/her answers,
o consider whether the user has reviewed the context of the word,
o consider how often each answer is given for a particular blob.
- Don't encourage the user to guess.

I understand that some of my proposed solutions may make solving the
CAPTCHA less convenient for the user (if they have to type, click, or
read more), so I suggest that you make these features optional for the
user: don't force the users to rate their confidence and don't force
them to consider the context.

reCAPTCHA Support

Aug 29, 2008, 5:26:40 PM8/29/08
Hi Jimmy,

You are very much correct that this is an issue with reCAPTCHA. We've thought about giving users context, however it's obviously a very tricky UI problem. We also deal with this problem with some NLP techniques (the computer can do a pretty good job of telling between words pairs like of/or ear/car).

- Ben
