Jimmy
unread,Aug 29, 2008, 5:17:23 PM8/29/08Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to reCAPTCHA
While testing reCAPTCHA, I have encountered blobs that I couldn't
resolve.
In one case, I knew the first and last letter of the four letter word,
but I couldn't definitively say what the middle two letters were. In
another case, I got blob that was either spilled/spashed ink or one
character.
In yet another case, I got four ambiguous vertical lines that couldn't
form a word I knew (first long, next three were short), but with a
little imagination, could be anywhere from four letters to two:
"liii" (with the dots missing from the i), "bn" (with a blank vertical
stripe through each character), "lin", etc..
In all of these cases, the blobs cannot be reliably resolved without
knowing their context. I also believe that there are only a finite
number of plausible interpretations. Due to the limited number of
plausible interpretations, a situation where multiple people guess the
same "word" is possible. To worsen the situation, the help for the
CAPTCHA encourages people to guess: "If you are not sure what the
words are, either enter your best guess or click the reload button
next to the distorted words." Given enough ambiguous blobs and enough
guesses, this situation becomes likely.
I'm assume that such a scenario would lead the reCAPTCHA to believe
that it now knows the word. (Please correct me if I am wrong here; I
do not know all of the details of this system.) However, its
knowledge would merely be the consensus of ill-informed guesses.
To remedy this situation, I propose the following solutions:
- Allow the user to see the context of the word. (This will probably
be the most effective solution.) This would allow the user to make a
more educated guess.
- Employ some method to record the reliability of the user's answers
and record alternative answers. The reliability might be determined
by considering the following factors (among others):
o have the user rate the confidence of his/her answers,
o consider whether the user has reviewed the context of the word,
o consider how often each answer is given for a particular blob.
- Don't encourage the user to guess.
I understand that some of my proposed solutions may make solving the
CAPTCHA less convenient for the user (if they have to type, click, or
read more), so I suggest that you make these features optional for the
user: don't force the users to rate their confidence and don't force
them to consider the context.