How to deal with CAPTCHA when using a VPN

Carder

Professional
Messages
2,619
Reputation
9
Reaction score
1,730
Points
113
What is CAPTCHA, why it is used, and how to deal with it when using a VPN.

Google keeps asking you to choose traffic lights or pictures of buses that don't end in any way, every time you use it when you're connected to a VPN?

Then you're dealing with CAPTCHA requests. You may be used to seeing them on web pages and forums, but do they also appear on Google? Usually not. But you'll start seeing them more and more often if you use Google under a VPN.

We offer you a translation of the technical media resource TechNadu's short guide on what a CAPTCHA is, why it is used, and how to deal with it when using a VPN.

What is a CAPTCHA and why is it used?​

CAPTCHA comes from "Completely Automated Public Test (to Tell) Computers and Humans Apart", which translates as "a Fully automated public forum to separate computers and humans". As the name suggests, CAPTCHA is a test that is easy for humans to pass, but difficult for computers. The aim was to help the sites to distinguish real users from bots.

Websites typically use captchas when users want to open an account, post a review, leave a comment, or buy a product. The goal is to detect and prevent automated scripts and bots from performing these tasks, so that the site is not bombarded with spam. In addition, captchas are also used to protect websites from cyber threats. For example, they can act as an effective barrier against DDoS attacks.

As for why Google uses CAPTCHA, it is very likely that their servers are protected from malicious traffic in this way. Also, according to them, they use CAPTCHA to prevent unauthorized login to the account.

Why do you see the CAPTCHA if you are using VPN?​

This is because almost all VPN services use shared IP addresses. This means that you share an IP address with many other VPN users.

And from Google's point of view, multiple search queries from a single IP address indicate suspicious traffic. In this way, the search engine marks the IP address as suspicious and prompts you to perform CAPTCHA requests to prove that you are a real person and not a bot.

The timing of the appearance captcha to all users in different ways. It is difficult to say exactly what it depends on. According to Reddit users, if you make 15-20 Google searches in a row, you will have to solve a CAPTCHA on average every 5-6 searches.

How to deal with CAPTCHA when using a VPN​


Use dedicated IP addresses
Unlike shared IP addresses, a dedicated IP address is used only by you. This way, you're less likely to get caught on a CAPTCHA, because Google won't detect multiple traffic sources from the same IP address.

Unfortunately, VPN services do not offer dedicated IP addresses, or they are paid. For example, on NordVpn, a dedicated IP costs $ 70 per year. However, you can create your own VPN service many times cheaper.

Connect to another server
This is not a guaranteed solution, but it can sometimes work. You can also receive CAPTCHA requests if you have a really bad ping, or the server's IP address was marked as suspicious by Google because it received too many search queries from it. If you use a VPN with an extensive network, you can quickly find an alternative server.

Various extensions
Buster is the most optimal option. This is a free and open source tool that solves reCAPTCHA tasks. Buster is quite fast and can process captchas in less than 30 seconds.

"I'm not a robot CAPTCHA clicker" is also mentioned in discussions on this topic, including on Reddit. It is available in Chrome and Firefox. But, as the name suggests, this extension is only useful for the CAPTCHA where you need to click "I'm not a robot". It won't work for captchas where you need to recognize images.

unCAPTCHA
The unCAPTCHA tool can also help. A developer study shows that this tool can solve reCAPTCHA tasks via audio with an 85% success rate. However, this is not for everyone. You can't install it as a browser extension. Instead, you will need to use the command line.

In addition, the developers clearly stated that reCAPTCHA now includes additional protection that limits the success of their application. Therefore, unfortunately, they no longer support this code.If it doesn't work, there's nothing you can do to fix it.

Do not clear cookies
We can't say for sure if this helps or not, as we haven't seen any improvement from it. However, some VPN users on Reddit say they don't clear cookies in their browsers, which helps them get fewer CAPTCHA requests.

If this really works, it's probably because Google, like any other site you visit, sees you as a returned user with a different IP address when you have Cookies in your browser, rather than as a new user with a different IP address. If you have already passed a captcha before, then the system is more likely not to ask you to do it again.

Use different search engines
If you don't want to log in to your Google account while using a VPN, to reduce the risk of encountering a CAPTCHA, you'll have to switch to a different search engine that doesn't use captchas as often. Some decent alternatives include DuckDuckGo, StartPage, and searX.

Try a different browser
There's a chance you won't get as many CAPTCHA requests if you use a more privacy-oriented web browser when connecting to a VPN. So instead of Chrome or Opera, try Firefox, for example.

Try less popular VPN providers
Less popular VPNs have fewer users, so Google is less likely to flag the IP addresses of their servers as suspicious. If you often use Google when connecting to a VPN, you might want to purchase a paid plan to use it as a fallback option. However, before using such a VPN provider, you must first make sure that it is secure and does not collect data.

Based on TechNadu materials.
 

Carding 4 Carders

Professional
Messages
2,731
Reputation
13
Reaction score
1,376
Points
113

Scammers use fake Google reCAPTCHA in phishing attacks​

Over the past three months, at least 2.5 thousand fake emails were sent to high-ranking employees of banks and IT companies.

6e41133cb5105530dabace6abbca81d9.png


Cybercriminals are sending thousands of phishing emails to Microsoft Office 365 users as part of an ongoing malware campaign to steal their credentials. Attackers give the campaign a look of legitimacy by using a fake Google reCAPTCHA system and landing pages of top-level domains that contain the logos of victims ' companies.

According to Zscaler's ThreatLabZ security research team, over the past three months, at least 2.5 thousand such emails have been sent to high-ranking employees of banks and the IT sector. Emails first redirect recipients to a fake Google reCAPTCHA page. Google reCAPTCHA-a system for protecting websites from spam and abuse, using the Turing test to distinguish between people and bots.

After passing the fake reCAPTCHA test, users are redirected to a phishing landing page that asks for their Office 365 credentials.

"The attack targets senior business leaders, such as Vice presidents and CEOs, who are likely to have a higher level of access to the company's confidential data. The purpose of the campaigns is to steal victims credentials and gain access to valuable assets of firms, the experts explained.

Phishing emails are disguised as automated emails from victims ' unified communications tools that allegedly contain a Voicemail attachment.

Authorization pages for the Microsoft service also contain various logos of companies where victims work, such as the software developer ScienceLogic or the office rental company BizSpace.

"After entering your credentials, the phishing campaign will display a fake message saying "Verification was successful". Then users are shown a recording of a Voicemail message, which they can play back, which allows attackers to avoid suspicion, " the experts noted.

The researchers found many phishing pages associated with the campaign that were hosted using common top-level domains, such as .xyz, .club, and. online. These top-level domains are commonly used by cybercriminals for spam and phishing attacks.
 

Carding 4 Carders

Professional
Messages
2,731
Reputation
13
Reaction score
1,376
Points
113

How some algorithms generate a captcha while others crack it​

It doesn't matter whether your intelligence is artificial or natural – after this detailed analysis, no captcha will interfere. At the end of the article – the simplest and most effective solution for crawling.

CAPTCHA is a fully automated public reverse Turing test to distinguish between computers and humans by automatically configuring certain tasks that are difficult for computers but easy for humans. This technology has become a security standard used to prevent automatic voting, registration, spam, dictionary attacks on website passwords, and so on.

1. Captcha Categories
Existing captchas are divided into three categories: text, image, and audio/video. Below, we will look at how various captchas are generated and what progress is currently being made with their circumvention. Do not scold us for the quality of images – we took drawings from scientific publications to which we give links =) the Full list of publications taken for analysis is given at the end of the article.

1.1. Text captcha
Text captchas are the most commonly used, but because of their simple structure, they are also the most vulnerable. This type of captcha usually requires you to recognize a geometrically distorted sequence consisting of letters or numbers.

To increase security, various security mechanisms are used, which can be divided into anti-segmentation and anti-identification. The first group of mechanisms is aimed at reducing the process of selecting individual characters, and the second group is aimed at recognizing the characters themselves. Figure 1 summarizes examples of captchas for different approaches in a tabular view.

1.1.1. Hollow symbols
In the case of the "hollow characters" captcha creation strategy, contour lines are used to form each character.

iTSw4-m0xlw.jpg

Such symbols are difficult to segment, but they are easily visible to people. Unfortunately, this mechanism is not as secure as expected. In Gao research [1] convolutional neural network successfully recognizes from 36% to 89% of images (depending on the type of distortion and the training sample).

1.1.2. overlapping characters
Combining and overlapping characters (CCT) makes segmentation more difficult, but it also reduces the user's readability. In other words, people themselves can't always successfully circumvent such a captcha.

v1CaeHAE4Eo.jpg

Researchers from China and Pakistan were able to crack CTT with a probability of 27.1% to 53.2% [2].

1.1.3. Background noise
eZBKUyuAWi0.jpg

Google's reCAPTCHA, which uses images from Street View, breaks in 96% of cases [3].

1.1.4. Two-level structure
A two-level structure is a vertical combination of two horizontal captchas, which complicates image segmentation.

DB5F_Std53M.jpg

Gao [4] proposed a segmentation approach to divide the captcha image both vertically and horizontally, and achieved a 44.6% success rate (9 seconds per image) using a convolutional neural network.

1.2. Captcha image
1.2.1. captcha based on selection

In the case of a selection-based captcha, users must select the correct answers according to the selection-based captcha hint. This is the simplest form of image-based captcha. For example, you need to select all cars, all road signs, and all traffic lights from the submitted images.

WGvurWcVuwE.jpg

Various selection-based captcha examples.

Goll [5] suggested using the support vector machine (SVM) method to distinguish between images of cats and dogs in the Asirra captcha with a successful recognition probability of 82.7%.

Gao's team [6] used OpenCV to detect faces in FR-captchas. We managed to get the detection probability from 8% to 42% with image processing in less than 14 seconds. FaceDCAPTCH was detected with a 48% probability in an average of 6.2 seconds.

Columbia University employees beat reCAPTCHA and Facebook CAPTCHA with a probability of 70.78% and 83.5%, respectively.

1.2.2. Click-based Captcha
In 2008, Richard Chow and his colleagues [7] first proposed a click-based captcha. It requires users to click on characters that are located on a complex background in accordance with the hint, as shown in Figure 7.

TINqpwsThK8.jpg

Such click captchas have two defense mechanisms: anti-Declaration and anti-recognition. Proper character recognition with the development of machine learning is no longer a difficult task. Therefore, almost all security mechanisms are focused on preventing attackers from correctly identifying characters.

1.2.3. Drag-and-drop Captcha
A drag-and-drop captcha determines whether the user is human, through the mouse track, pointer movement speed, and response time.

CkahrpqZ8OY.jpg

Users need to rotate the image of the object so that it is in its natural position. For example, flip the image of the table so that it is on its legs. This is easy for a human, but difficult for bots.

1.3. Audio / video captcha
1.3.1 Audio Captcha

This captcha is usually considered as an alternative to visual captcha for visually impaired users. Students are asked to complete a task based on what they have heard, for example, to identify a specific sound, such as the sound of a bell or piano [8].

y-Gmim6k8yI.jpg

there is another type of audio-based captcha that requires users to speak rather than just listen. For example, Gao [9] proposed an audio captcha (Figure 9), in which the user should read out a sentence chosen randomly from a book. The generated audio file is analyzed to determine if the user is human.

But the audio captcha is also hacked: scientists from Stanford University learned how to hack audio captcha with a 75% chance.

1.3.2 Video Captcha
In a video clip, users are presented with a video file, and they must select a sentence that describes the movement of a person in the video.

Japanese researchers used a HMM-based solution (hidden Markov model) and got an accuracy of 31.75%.

Let's now look at exactly how neural networks are used to crack captchas.

2. Neural networks with DenseNet and DFCR architectures
In 2017, Gao Huang, Zhuang Liu, and others [10] built 4 deep convolutional neural networks with an architecture now called DenseNet. Dense blocks of neural networks were interleaved with skip-connection layers (Fig. The input of each layer in the block was a combination of the output of all previous layers. This distinguished the new architecture from the traditional neural networks at that time, where the layers were connected in series.

Nv9rgxSp-j8.jpg

The densenet architecture has several advantages: it solves the variance problem and effectively uses the features of all previous convolutional layers, reducing the computational complexity of network parameters and demonstrating good classification performance.

An example of a variation of the DenseNet architecture is the DFCR neural network. The original 224x224 captcha images were passed through the convolution layer and pooled to output 56x56 images. After that, 4 "dense" blocks were alternately connected, alternating with transition layers (Fig. 11). The structure of the transition layer made it possible to reduce the dimension of the feature map and speed up calculations.

BgJ8MbkOmGc.jpg

Next, feature maps were used to check whether the map and class matched. The values in each feature map were added together to get the average value, which was taken as the class value and output to the corresponding format. softmax-the classification layer.

Experiments show that DFCR not only preserves the main advantages of DenseNet, but also reduces memory consumption. In addition, the accuracy of captcha recognition with background noise and superimposed characters is higher than 99.9% [11].

3. Generative-adversarial neural network
rPxKquDY3M8.jpg

Gan captcha generator (generative-adversarial network) consists of two parts: 1) a network that creates captchas that are close to the original ones; 2) a neural network that distinguishes an artificial captcha from a real one (solver).

Before submitting the captcha to the solver, it removes the used security tools and normalizes the font. For example, you can fill in empty characters and normalize the spaces between them. The preprocessing model is based on Pix2Pix.

SRgrfWV5SHo.jpg

Scheme of the algorithm. First, a small dataset of real captchas is used to train the captcha synthesizer (1). then artificial captchas are generated (2) to train the solver (3). Improving the solver (4)

Then a large number of labeled captchas are generated, which are used to train the solver. The trained solver accepts the captcha after preprocessing and outputs the appropriate characters. Fine-tuning the solver is done using a small set of manually marked captchas that are collected from the target site.

As a specific convolutional neural network, it turned out to be a good option to use the architecture LeNet-5, which was originally used for recognizing single characters. Adding two convolutional layers and three pooling layers expanded its capabilities to multiple character recognition.

7uUkQhvMVoU.jpg

Solver training algorithm.

The information obtained at the early layers of neural networks is useful for solving many other classification problems. The deeper layers are more specialized. This property is used to calibrate the solver to avoid systematic error or overfitting.

The output layer of the solver consists of a series of neurons with one neuron per symbol. For example, if in a captcha n characters, the output layer will contain n neurons, where each neuron corresponds to a possible symbol. If the number of characters is not fixed, then you need to train the solver for each possible value. n.

Hacking probability results:
  • Overlap (depending on the difficulty) - from 25.1% to 65%
  • Turn around (15°, 30°, 45°, 60°) – 100%, 100%, 99.85%, 99.55%
  • Distortion – 92.9%
  • Wave effect – 98.85%
  • Combination of the above – 46.30%

4. Convolutional neural network + long short-term memory (LTSM)
To recognize a sequence of characters without segmentation, you can use a model consisting of a convolutional neural network connected to a neural network with long-term short-term memory (LSTM) and the mechanism of attention.

ExGC2cV0GeU.jpg

A convolutional neural network is an encoder that extracts features from a captcha. The original image is represented as a linear vector space. Feature vector created by the encoder denoted by fijc, where i and j - location indexes on the feature map, c - channel index.

LSTM works as a decoder and translates the feature vector into a text sequence. As opposed to recurrent neural network, LSTM can store information for long periods of time.

In the traditional sequence-to-sequence model, a vector is passed to the input at each time step . But the bottleneck of the standard model is that the input is constant. The attention mechanism allows the decoder to ignore irrelevant information, while preserving the most significant information about the vector f. Different parts of the feature vector are assigned different weights, so that the model can focus on a specific part of the vector at each step, making predictions more accurate. This is the main reason that the proposed method can recognize individual characters without segmentation.

In experiments, a shortened model was used for decoding Inception-v3. The decoder consisted of LTSM cells, each containing 256 hidden neurons. The number of LSTM cells was equal to the maximum length of the captcha string. For the entire structure, the number of trained parameters was 9 million. Depending on the original size, each captcha was scaled so that the short side was in the range from 100 to 200 pixels. At the training stage, the model was trained using the stochastic method. gradient descent using moment 0.9. the learning Rate was 0.004. 200 epochs with a batch size of 64 were spent on training.

Ni4ozL09LlM.jpg

After each epoch, the model was checked. If the model's performance was better than that of the current best model, the weights of the best model were updated. CCT captchas (overlapping text) were collected as experimental data (Figure 17).

For the captcha sample in the test set, the complete predicted sequence of characters was considered correctly determined only if it was identical to the answer marked manually. For a sample of 10,000 samples (training and test in the ratio of 3:1) the probability of successful recognition was 42.7%. As the number of samples increased to 50,000, 100,000, 150,000, and 200,000, the probability increased to 87.9%, 94.5%, 97.4%, and 98.3%, respectively.

5. Reinforcement Learning
Reinforcement learning was used to bypass Google reCAPTCHA v3 [14].

The bot's interaction with the environment was modeled as a Markov decision-making process (MDP). MDP was defined as a tuple (S, A, P, r), where S is a finite set of States, A is a finite set of actions, P is the probability of transition between States P(s, a, s'), r is the reward received after transition to state s' from state s with probability P.

The goal of the campaign was to find the optimal rule π* that maximizes the expected rewards in the future. Suppose that the rule is parametrized by a set of weights w such that π = π(s, w). Then the problem is given as

sXnuMjfJB8w.jpg

where γ is some constant (discount factor), rt is the reward at time t.

Reinforcement learning evaluates gradients using the formula

vXSRtmv2m-E.jpg

To pass reCAPTCHA, the user moves the cursor until the reCAPTCHA check box is selected. Depending on this interaction, the reCAPTCHA system will "reward" the user with a point. This process was modeled as MDP, where the state space S is the possible cursor positions on the web page, and the action space A = {up, left, right, down}. This approach makes the task similar to the grid world task.

As shown in Figure 18, the starting point is the initial cursor position, and the target is the reCAPTCHA position. For each test, the starting point is randomly selected from the upper-right or upper-left area. Then a grid is constructed, where each pixel between the start and end points is a possible cursor position. Cell size c – the number of pixels between two consecutive positions. For example, if the cursor is at the position (x0, y0) and performs an action to the left, then the next position will be (x0 – c, y0).

oO1UH9d5Y-g.jpg

In each test, the cursor position is random, and the bot performs a sequence of actions until the reCAPTCHA or t limit is reached, where a and b are the height and width of the grid, respectively.

2jYWn0r5eAQ.jpg

At the end of the test, the bot receives feedback from reCAPTCHA, just like any normal user.

In most of the previous works, a web browser was used to automate its actions. Selenium, however, often failed the test because HTTP requests detected an automatically generated header and other additional variables that are not present in a normal browser.

This problem can be solved in two different ways. The first is to create a proxy to remove the automatic header. The second option is to launch the browser from the command line and control the mouse using special Python packages, such as the library .

It's funny that simulations performed in a browser with a Google account enabled are more likely to pass verification than simulations performed without authorization.

In experiments with discount factor γ = 0.99, learning rate-10-3, and batch size 2000. the captcha was hacked with a 97.4% probability.

6. Solving captchas by people
As you can see from this article, machine learning has a high entry threshold, and everything that was described in the publication is only the tip of the iceberg. If you dig deeper, you can soon claim the title of Junior in neural networks =)

But for real work, you need a team of specialists and rent computing power. Add to this the time of learning/retraining networks, the growing number of types of captchas, and the variety of languages, and it turns out that it is faster and cheaper to use online services where captchas are solved by real people.

Among such services, it stands out ruCaptcha. The service has fine-tuning of the solver: the number of words, case, numbers and / or letters, language (55 to choose from), mathematical actions, and so on.

The following types of captchas are solved: plain text, ReCaptcha versions 2 and 3, GeeTest, hCaptcha, ClickCaptcha, RotateCaptcha, FunCaptcha, KeyCaptcha.

Interaction with the server takes place via the API, in other words, you can embed the solution in your own product. There is a refund function for incorrect speech recognition, and technical support responds to any questions that arise (adequate in comparison with competitors).

Of course, for solving captchas, the service pays specific people who are willing to solve tasks for a small fee. Accordingly, the service takes this money from customers who do not have to deal with routine tasks. Prices at the time of writing were as follows: 1000 captchas were hacked in no more than $ 1 (an average of 7.5 seconds per captcha), 1000 recachs for $ 2 (an average of 32 seconds per captcha). That is, the same price regardless of the load and how much other customers paid.

For comparison, one middle-level machine learning specialist will cost at least $ 2000 month on the market.
 
Top