Wednesday, January 4, 2006

A plan for blog's spam

(Please, excuse my lame rip-off of Paul Graham's title)

I shall say it up-front. I don't like captchas.

For those in the dark, captchas are images of words, usually distorted. Sites that accept public comments ask their users to write the word in the captcha. If the user is able to read the word, then the system concludes the user is a human being. If it fails, it concludes that it was a bot (a computer program). The reason is that is very easy for a user to read a word, even if deformed, and it's extremelly hard for a computer to visually "read" a word.

Given than bots usually post spam, and human beings usually post legit comments, the "captcha" test is a simple way to differenciate between spam and human's comments.

They have however several problems:

  1. Human's with disabilities (for instance, blind people) are not able to pass the test.

  2. New programs have been developed that are able to read the captcha, and thus pass the test.


Newest captchas, harder to read, have been developed to make programs fail. They are, however, so hard to read, than even humans fail regularly the test. For instance, I needed to try three times before I could pass the test in the captcha I had on this blog. That's embarrasingly the same success rate of latest programs (33%).

I thus started looking for another solution. In the same "make it hard for computers, easy for humans" spirit, I come to the conclusion that the easiest way to solve the problem was to just ask some stupid (for a human) question to the user.

My question to let you publish on my blog is: What's the color of white pages?

And my question to let you register an user on my blog is: What's the name of your planet?

The replies to these questions are, of course, "white" and "earth". See, I'm not even afraid of putting here the replies. A computer has to understand english to extract the replies.

If this method ever becomes popular, spammers will surely attack it. It seems to be safe of automatic attacks except for spammers building a database of questions - replies, or using a load of possible replies (for instance, all the english words in the dictionnary).

My suggestion to prevent the first kind of attack is to use your imagination for new questions. Make them relative to recent actuality, for instance, so that previous collected questions - answers become useless.

As for the second attack, you can artifially wait several seconds before giving your answer, so that people trying a lot of different answers will need a lot of time to check the reply of all them.

This method is obviously not useful for high profile sites, as Yahoo! Mail, as you only need a couple of seconds of human's brain to fully crack it. But I hope it to be useful for random blogs, because spammers don't want to crack a random blog, they want to crack a million random blogs. I hope that the expectative of having to reply a million different questions will change their mind. Or most probably, make them spam other, unprotected blogs.

I have removed moderation on the comments on this blog. Let's see how this thing works...