History and development of CAPTCHA

Man · Nov 21, 2024

We started with a text CAPTCHAs and ended up with a simple checkbox to check, improving the system after each failure.

You go to a website to buy plane tickets. Before you click the "Submit" button, you need to check the box that asks: "Are you a robot?"

At first glance, this seems ironic. Why do I need to prove that I am a human, and in front of a computer?

And even if I check this box, how does that prove I'm human? A robot can check this box too, right?

It's like a jury asking a murderer if he committed a crime. "Of course I didn't kill anyone," the defendant would answer.

So what's the point of this question? Why do CAPTCHAs exist at all? And how do they verify that a user is a real person based on simple queries?

In this article, we'll take a detailed look at why CAPTCHAs are needed, how they've evolved over time, their different versions, and more.

What is CAPTCHA?

CAPTCHA stands for "Completely Automated Public Turing Test to Tell Computers and Humans Apart."

Quite a complicated acronym, isn't it? We'll simplify it now.

What is the Turing test?

Legendary British mathematician and computer scientist Alan Turing debated with his colleagues and critics about whether a machine (digital computer) could ever achieve a level of intelligence comparable to a human.

To prove his point, Turing proposed a game. In this game, an "interrogator" would ask questions to a human and a computer via text chat. If the interrogator couldn't tell the difference between their answers, and the computer successfully impersonated a human, it would pass the test. Turing called this the "imitation game."

Who would have thought that decades later the same principle of the test would be used to create CAPTCHA, but now to distinguish people from machines.

In 2000, a 22-year-old boy named Luis von Ahn, along with his professor Manuel Bloom, developed CAPTCHA to prevent automated programs from attacking networks and websites.

Why do we need CAPTCHAs?

You've probably heard of bots. Bots are programs that can perform specific tasks based on a script given to them. They often mimic human behavior and can perform tasks much faster.

There are useful bots, such as search engines that crawl web pages to index content, or chatbots that simulate human conversation.

However, there are also malicious bots that can interfere with users by spreading spam, taking over accounts, or even taking down large websites by carrying out DDoS attacks.

Here are some of the malicious actions that such bots can perform:

Credential Stuffing
Content Scraping
DoS or DDoS attacks
Collecting email addresses
Spam content
Password cracking by brute force

If left unchecked, these bots can cause a variety of problems, such as:

Undermining the credibility of online surveys.
Hacking online accounts using brute force attacks.
Ticket speculation: the mass purchase of tickets for subsequent resale.

In one such case, the supermarket giant Target suffered a data breach in 2013 that affected 70 million people. At the time, Target’s vendor portal did not have a CAPTCHA. The disaster was the result of a phishing email sent to their customer base.

This is why we need CAPTCHA - to prevent manipulation of the system, actions that could affect millions of users on the Internet and lead to large-scale fraud.

How does CAPTCHA work?

In most cases, CAPTCHA relies on visual tests, taking advantage of the fact that automated bots do not have the same level of understanding of visual data as humans.

It all started with having to enter strange, distorted text to enter the site or leave a comment.

So it wasn't exactly fully automated, since it required manual input, but let's still call it "fully automated." Why not?

Types of CAPTCHA

CAPTCHAs fall into three main categories:

Text CAPTCHAs
CAPTCHA images
Audio CAPTCHA

Let's look at each of them.

Text CAPTCHAs

This is the oldest form of CAPTCHA, which uses known words or phrases, random distorted texts, combinations of numbers and letters, etc.

These symbols are presented in an unusual way, making them difficult for automated programs to understand.

Text CAPTCHAs may include distorted characters, rotation, uneven scaling, and other effects. Some CAPTCHAs may also use character overlays with graphical elements such as color, background noise, lines, etc.

Image based CAPTCHA

Many of us have encountered CAPTCHAs that require us to tag images of specific objects, such as traffic lights, cars, or other items.

These CAPTCHAs are easier for humans but much more difficult for bots, as they require not only image recognition but also its semantic interpretation.

Audio CAPTCHA

Text and visual CAPTCHAs are not suitable for visually impaired users, so audio CAPTCHAs were developed.

Audio CAPTCHAs are used in conjunction with text or image CAPTCHAs and provide an audio recording with a series of letters or numbers. These recordings usually contain background noise, making them difficult for bots to recognize.

The emergence of reCAPTCHA

When millions of people took these tests every day, everything worked as it should. But as we know, innovation knows no bounds, and Louis von Ann saw another opportunity.

What if, instead of wasting time recognizing random, distorted texts, we used old, unreadable fragments of books?

In an interview with The Walrus magazine, Louis said he had created a system that "wasted millions of hours of a priceless resource - the human brain - ten seconds at a time."

And it's true! Recognizing 200 million words a day turned into 500,000 hours of effort.

So he came up with the idea of using real texts from old books that optical character recognition (OCR) technologies couldn't recognize. At that time, OCR couldn't correctly read about 20% of scanned words.

Louis's new idea was to harness human effort for good: users would unknowingly help the OCR system decipher these difficult words and add them to the database.

This new version of CAPTCHA is called reCAPTCHA. The first book digitized using this method was the New York Times archive, which began in 1851 and currently contains 13 million articles.

How does reCAPTCHA work?

It is based on the principle of crowdsourcing. First, the book is scanned digitally by the administrator of the reCAPTCHA program. The program selects two words: one that has already been read and recognized by the OCR system, and one that the OCR could not recognize.

The user must guess both words in the reCAPTCHA field. If the user enters the first (recognizable) word correctly, the program assumes that the second word entered is also correct and uses it for digitization.

The second word (which OCR couldn't read) is then shown to other users. The program compares all the answers, and after collecting enough confirmations, it can recognize the word with a high degree of confidence.

Thus, the program solves two problems at once: it checks whether the user is a human, and digitizes words that the OCR system could not recognize, adding them to the general knowledge base.

Google and reCAPTCHA

In 2009, Google saw the potential and acquired reCAPTCHA for use in the Google Books project. Google used reCAPTCHA to get people to recognize words or characters that their image processing algorithms couldn't identify, making the process much easier.

The Google Books project was an ambitious initiative to digitize all the books in the world and create a massive digital library accessible to everyone. According to Wikipedia, Google had scanned over 40 million books by October 2019. However, the project faced a number of legal issues related to copyright, making it difficult to implement.

Problems with reCAPTCHA

As the system became more secure, attackers became more creative, fueling the constant evolution of CAPTCHA.

A 2014 Google study found that modern AI technology could resolve even the most distorted text with 99.8% accuracy, and numbers in images with 90% accuracy. This made visual data processing an unreliable verification method. A new approach was needed.

NoCAPTCHA reCAPTCHA

Then came the revolutionary API that we use today — NoCAPTCHA reCAPTCHA. It's that simple checkbox we talked about at the beginning. All you need to do is just check the box and you can continue working.

How does NoCAPTCHA reCAPTCHA work?

In fact, it’s much more complicated than it sounds. NoCAPTCHA is powered by an advanced risk analysis API that constantly monitors user behavior. The system analyzes all interactions with CAPTCHA: the cursor movement before clicking on the checkbox, during verification, and after you put a checkmark. The combination of these actions determines whether the user is a human.

Why does NoCAPTCHA reCAPTCHA work?

The basic idea is that automated malicious bots use pre-written scripts to perform functions. If the bot tries to "slip in" and check a box, it will simply perform the programmed action without the natural cursor movement of a human.

This way, NoCAPTCHA reCAPTCHA can determine whether the function was performed manually or via a script.

However, even this method is not completely secure. There may be programs that can simulate mouse clicks and automatically check the box. So Google may also take into account other data that users provide unintentionally, such as IP addresses and cookies , which helps prove that you are a human.

Although Google does not disclose all the exact methods for identifying bots to us (let's leave that to them).

What if that's not enough?

Even with such a sophisticated security system, there can still be uncertainty. With that in mind, Google has added an extra step to check when the system is unsure of the outcome: image identification .

When the program is in doubt, it can ask the user to prove their humanity using old-style CAPTCHA (text and numbers) on desktop computers or image CAPTCHA on mobile devices.

There is also a form expiration timer running in the background to prevent bots from solving the CAPTCHA after a long time.

Next CAPTCHA Innovations

Innovations in this area never stop. We started with text CAPTCHA and came to a simple checkbox, adapting after each failure.

Every failed CAPTCHA leads to the development of artificial intelligence. Why? Because in order for the test to fail, someone had to come up with new ways for the computer to solve the test.

This stimulates further development and the emergence of new types of CAPTCHA.

One such innovation is the Honeypot Method.

How does the Honeypot method work?

It uses deception to make bots reveal themselves. When we create a form, automated programs will likely fill in all the fields. A human will only fill in the fields that are visible to them.

What if we add fields that are invisible to users but are present in the form?

Bots will fill in these hidden fields as well, thus giving themselves away.

Honeypot Method - Double Advantage

The Honeypot method works as a double-edged sword : it simplifies the verification process for users and effectively catches malicious bots. Simplicity for humans and a trap for bots is what makes this method effective in the fight against spam.

What is the future of cybersecurity?

However, as we discussed earlier, anything that is created can be hacked - if the adversary has the motivation to do so.

CAPTCHAs were originally designed to protect against spam bots. But today, bots do more than that — they attack servers, steal data, and commit fraud. The emergence of threats such as CAPTCHA factories (organized groups of people who solve CAPTCHAs for a fee) and smarter AI bots calls into question the effectiveness of CAPTCHAs.

CAPTCHAs – Are They Still Effective?

CAPTCHAs often act as speed bumps for hackers — obstacles that slow down attacks but don’t stop them completely. Bots are getting smarter, and technologies like machine learning can easily bypass even the most complex CAPTCHAs. On the other hand, making CAPTCHAs more complex can lead to frustrated customers. No one wants to waste time guessing images or typing random text when simply registering or logging in.

This is clearly not a long-term solution, and increasing the difficulty of CAPTCHA only slightly delays attackers.

A New Direction in Cybersecurity

As we look to the future, it’s important to look for new solutions. Online businesses need to invest in technologies that can effectively detect bots while still providing a great user experience.

There is a possibility that new solutions are already being developed. Perhaps in a few years we will see a completely different approach to online security that goes beyond CAPTCHA.

Source

History and development of CAPTCHA

Man

Professional

What is CAPTCHA?

How does CAPTCHA work?

Types of CAPTCHA

Text CAPTCHAs

Image based CAPTCHA

Audio CAPTCHA

The emergence of reCAPTCHA

How does reCAPTCHA work?

Google and reCAPTCHA

Problems with reCAPTCHA

NoCAPTCHA reCAPTCHA

How does NoCAPTCHA reCAPTCHA work?

Why does NoCAPTCHA reCAPTCHA work?

What if that's not enough?

Next CAPTCHA Innovations

One such innovation is the Honeypot Method.

How does the Honeypot method work?

Honeypot Method - Double Advantage

What is the future of cybersecurity?

CAPTCHAs – Are They Still Effective?

A New Direction in Cybersecurity

Similar threads

History and development of CAPTCHA

Man

Professional

What is CAPTCHA?​

How does CAPTCHA work?​

Types of CAPTCHA​

Text CAPTCHAs​

Image based CAPTCHA​

Audio CAPTCHA​

The emergence of reCAPTCHA​

How does reCAPTCHA work?​

Google and reCAPTCHA​

Problems with reCAPTCHA​

NoCAPTCHA reCAPTCHA​

How does NoCAPTCHA reCAPTCHA work?​

Why does NoCAPTCHA reCAPTCHA work?​

What if that's not enough?​

Next CAPTCHA Innovations​

One such innovation is the Honeypot Method.​

How does the Honeypot method work?​

Honeypot Method - Double Advantage​

What is the future of cybersecurity?​

CAPTCHAs – Are They Still Effective?​

A New Direction in Cybersecurity​

Similar threads

What is CAPTCHA?

How does CAPTCHA work?

Types of CAPTCHA

Text CAPTCHAs

Image based CAPTCHA

Audio CAPTCHA

The emergence of reCAPTCHA

How does reCAPTCHA work?

Google and reCAPTCHA

Problems with reCAPTCHA

NoCAPTCHA reCAPTCHA

How does NoCAPTCHA reCAPTCHA work?

Why does NoCAPTCHA reCAPTCHA work?

What if that's not enough?

Next CAPTCHA Innovations

One such innovation is the Honeypot Method.

How does the Honeypot method work?

Honeypot Method - Double Advantage

What is the future of cybersecurity?

CAPTCHAs – Are They Still Effective?

A New Direction in Cybersecurity