reCAPTCHA word filtering

576 views
Skip to first unread message

td123

unread,
Jun 21, 2007, 9:03:07 AM6/21/07
to reCAPTCHA
Is there filtering for obscene or "offensive" terms on the reCAPTCHA
puzzles before the images/words reach the client? Our implementation
is in test/review and one of our testers was concerned about "Reich"
being returned as one of the 2 words to solve. How do we know users
won't be asked to solve obscene or politically offensive or
insensitive terms? Granted, "offensive" is subjective and some people
are going to be offended merely by the presence of the image
verification tool. Is there a disclaimer available?

Thanks,
TD

reCAPTCHA Support

unread,
Jun 21, 2007, 3:04:56 PM6/21/07
to reca...@googlegroups.com
Hello,

We do filtering, and are working on the list of bad words. Coming up with a broad list of words -- for example including "reich" is extremely hard. For now, curse words shouldn't show up, however there are some mildly offensive words (for example "bastard") and also words that may be politically sensitive ("Reich"). Even with a filter, there is a chance that these words will be incorrectly OCR'd and show up as words to read. However, the first time a user answers with one of the bad words, the word will be banned from the system.

Even traditional captchas can have issues in this sense. We've gotten emails about Yahoo displaying a captcha that said "fuch".

At the end of the day, it's important to note that these words come from genuine literary sources. In the CAPTCHA they are out of context. The book that had the words "Reich" was most likely talking about the history of Germany during that period. Having digital versions of such books is essential in preserving the history of this era so that future generations can learn not to emulate the behavior that caused "Reich" to be such a sensitive word.
--
reCAPTCHA: stop spam, read books
http://recaptcha.net

Brian

unread,
Jun 21, 2007, 11:42:57 PM6/21/07
to reCAPTCHA
As a side note, letting people know that once a word is entered as a
curse once it will forever be banned (assuming I am reading that
correctly), is probably a bad idea...I'm sure someone somewhere will
begin randomly submitting one of the two words as a curse in hopes of
throwing off the system. Much better is to require 2 identical curse-
word submissions. You may have already done that...just adding my
$0.02.

--Brian

On Jun 21, 12:04 pm, "reCAPTCHA Support" <supp...@recaptcha.net>
wrote:


> Hello,
>
> We do filtering, and are working on the list of bad words. Coming up with a
> broad list of words -- for example including "reich" is extremely hard. For
> now, curse words shouldn't show up, however there are some mildly offensive
> words (for example "bastard") and also words that may be politically
> sensitive ("Reich"). Even with a filter, there is a chance that these words
> will be incorrectly OCR'd and show up as words to read. However, the first
> time a user answers with one of the bad words, the word will be banned from
> the system.
>
> Even traditional captchas can have issues in this sense. We've gotten emails
> about Yahoo displaying a captcha that said "fuch".
>
> At the end of the day, it's important to note that these words come from
> genuine literary sources. In the CAPTCHA they are out of context. The book
> that had the words "Reich" was most likely talking about the history of
> Germany during that period. Having digital versions of such books is
> essential in preserving the history of this era so that future generations
> can learn not to emulate the behavior that caused "Reich" to be such a
> sensitive word.
>

reCAPTCHA Support

unread,
Jun 21, 2007, 11:47:13 PM6/21/07
to reca...@googlegroups.com
We probably wouldn't allow one answer to eliminate the word. We might also consider things such as the difference between OCR's guess for the word (eg, if OCR guessed luck but the person said f***, we might take it more seriously than if OCR said apple and the person said f***)

Reply all
Reply to author
Forward
0 new messages