The Past, Present, and Future of Password Cracking

Man · Nov 9, 2024

Passwords remain an important means of ensuring cybersecurity, and it is unlikely that they will find an adequate replacement in the foreseeable future. Yes, sometimes tokens are used instead of passwords, but they have a different, not fully studied threat model, and often it is necessary to choose proven means. The task of selecting passwords does not lose its relevance, and not only cybercriminals think about this area of work.

In this article, we will look at why information security specialists and ethical hackers select passwords, what scenarios such attacks are carried out under, what methods and tools are used for this. At the same time, we will take a short excursion into the history of authentication using passwords.

This article is an adaptation of a talk given by Alexander Peslyak (aka Solar Designer) at the OffensiveCon24 conference. The speaker is the founder of the Openwall project, the creator of the popular password-guessing tool John the Ripper, and a contributor to projects such as OpenBSD and Debian.

Why use and hack passwords?

Today, passwords are the unique authentication factor, the “what we know” in two-factor authentication (2FA) and multi-factor authentication (MFA). They are widely used to create various keys: for example, data encryption. Mnemonics can be used to provide seed keys, for example, for wallets, when only encoding is performed. A passphrase is used as an additional factor on top of the mnemonic (a passphrase without a mnemonic does not work). Generated seed phrases are usually used, to which additional words are sometimes added. Similar concepts and techniques apply to other low-entropy hashable data.

However, the goals and methods of working with passwords are generally clear to anyone who has at least some relation to cybersecurity. But why do you need to select passwords if you are not an intruder? Let's list them point by point:

Authentication to prevent or gain access to systems during security audits and penetration tests.
Access recovery (alternative to password reset).
General development: expanding lists of guessed/leaked passwords, training and test datasets for tools, research projects, competitions, history archiving, malware and backdoor analysis, OSINT, recovery of other hashed data.

Password guessing scenarios

There are four main scenarios that can be identified that allow security professionals to achieve the results outlined above.

Offline selection of local copies of hashes, traffic dumps, encrypted data (this is what our article will mainly be devoted to).
Online brute force on remote systems. As an example, we can recall the program for brute force on authorizations of login-password combinations THC Hydra. The product was developed by van Hauser.
Online hacking of a remote system with unprivileged local access, password determination via third-party channels (network timings or keystrokes) and physical hacking of a tamper-resistant device.
Password reset, leaking/extracting them in clear text, bypassing verification and direct key extraction through hacking.

Attack targets

As is well known, the end ~~justifies~~ the means. Depending on what exactly the attack with password selection is carried out for, the corresponding tools are used. Thus, some solutions are designed to extract authentication passwords, others - to gain access to encrypted files and file systems.

However, the most flexible modern programs for password selection, or, in other words, hash cracking (John the Ripper and Hashcat), can work not only with authentication passwords: they are quite applicable for achieving other attack goals.

For example, John the Ripper allows you to operate with so-called "non-hash": for this, the program has a set of tools *2john.py, which extracts pseudo-hashes from files. In turn, the BTCrecover tool is used to select a passphrase when using a specific mnemonic.

Optimization of selection

Achieving the above goals requires considerable resources. To prevent password selection from dragging on forever, this process needs to be optimized. And here specialized solutions come to the rescue again.

Let's outline the main points that allow you to optimize the work of hash cracking programs :

The ratio of speed and economic costs of selection (equipment, maintenance, energy consumption). The search for passwords is performed in an optimal order, which avoids or minimizes duplication of password candidates. This increases the uniqueness of candidates, optimizes memory usage. Comparisons with downloaded hashes or checks for correct decryption are also performed.
Memory and storage requirements in terms of speed.
Dictionary structure: the most popular passwords are placed at the beginning, duplicates and garbage (too long and unrealistic) passwords are removed.
Create target dictionaries for each of the hashes.
Feedback loops and workflows (changing the optimal search order based on passwords already cracked).

History of password authentication

It is hard to imagine now that passwords were once practically unprotected and their selection was not particularly difficult. Let's take a short excursion into the past and trace how authentication processes have evolved and developed over time. So, let's go!

1. Storing passwords in clear text (TENEX, Unix, 1960s - early 1970s, CTSS)

Of course, such systems were unreliable and prone to hacking.

"What happened at CTSS one time was that any user who logged in would find that instead of the standard daily message, they would be shown a file with all their passwords."
Fernando Corbato, On Building Systems That Will Fail, 1991.

The problem was caused by a collision of temporary files of the text editor. TENEX faced the problem of leaking character timings, which was aggravated by pagination.

The Unix system was not without its unpleasant surprises.

"The Unix system was originally implemented with a password file that stored up-to-date passwords for all users."
Robert Morris and Ken Thompson, Password Security: A Case History, 1978.

2. Password Hashing (Multics, early 1970s)

The main problem was that the Multics User Control subsystem used a one-way encryption hash function.

"I knew people could take square roots, so I squared the passwords and ANDed them with the mask to cut off some bits...
After the successful cracking of the system by a team of US Air Force experts who tested the security of Multics in 1972-1974, we quickly changed the hashing method to a new, more secure one."
Tom Van Vleck, How the Air Force cracked Multics Security, 1993.

3. Password Hashing (Unix, mid-1970s)

In the mid-1970s, hashing processes became somewhat more complex. But password guessing was still not particularly difficult for those with the necessary skills.

"Crypt(3) in Unix up to the sixth edition used an encryption program that simulated the M-209 cipher machine. This machine was used by the US Army during World War II. The password was used not as the text to be encrypted, but as a key to encrypt a constant…
The process of encrypting one test password and checking the result after changing the algorithm to ensure maximum speed took about 1.25 milliseconds on the PDP-11/70…
In essence, it takes no more time to check an encrypted trial password against all passwords in the entire password file or against the entire collection of encrypted passwords, including those collected from other installations."
Robert Morris and Ken Thompson, Password Security: A Case History, 1978.

In fact, similar schemes were used in web applications and in Windows in the early 2000s.

4. Salt and other modifications (Unix, late 1970s)

Unix later improved the system by adding the concept of a salt (a sequence of data added to a cryptographic key to protect against brute-force decoding), password policy checking, and a slow hashing function.

5. BSDI, bcrypt, PBKDF2 (1990s)

The only significant change during these years was the ability to configure a slow hash function.

6. Scrypt, Argon2 (2010s)

In 2009, Scrypt author Colin Percival introduced the concept of memory-hardness functions in his paper. They are designed to consume large amounts of computer memory and thus greatly reduce the efficiency of parallel computing.

Now the password guessing problem became two-dimensional: its solution required not only computing resources, but also memory.

Other authentication methods

So, we have briefly looked at the development of password authentication on a historical timeline. However, there are other mechanisms that began to appear in the 1990s.

Their distinctive feature is the transmission of the password to the server in an unencrypted form. Often this happens via Transport Layer Security, but there are other authentication methods. Usually, such systems are also susceptible to offline hacking using certain authentication data:

Request/Response (various types): sniffing through the network of request/response pairs.
Kerberos: TGT, AFS user database.
S/Key, OPIE: skeykeys file.
SSH: private key encrypted with passphrase, hashed by known_hosts.
SRP (and other PAKE): verifiers.

Password guessing speed

Over time, not only authentication mechanisms have evolved, but also methods aimed at speeding up password selection. The first Unix password selection competitions were held back in 1983.

1980s: Unoptimized Methods for Generating Hashes

The unoptimized algorithm for each user and candidate password hash did the following:

The candidate and the hash salt were taken.
The hash was calculated.
The resulting hash was compared with the hash to which the password was selected. In case of a match — PROFIT!: step 1 was performed with the new hash. If there was no match, the entire cycle started over again.

1980s: Partially optimized methods in the absence of salt

If no salt is used, algorithms can be optimized to reduce the repeated hashing costs. In unsalted hashes, the hash from one candidate is calculated once and then compared to all the hashes being searched for.

A candidate is accepted.
The hash is calculated.
The resulting hash is compared with all hashes to which the password is selected. If there are any unselected hashes, then the first step is performed again.

Early 1990s: Partially optimized methods using salt

With salt, optimization is a little different. Grouping by salt is performed, then for each of them a comparison is performed with a set of hashes with salt (yes, in theory this should not happen, but in practice it does).

Late 1990s: highly optimized methods using salt

This system can be optimized even further, as was done in John the Ripper: compute a series of hashes at once, and load many salted hashes to perform many-to-many comparisons. One can also optimize the construction time and hash function by moving some operations into an outer loop. Here again, the cost is amortized by reusing final or intermediate results.

Comparing many-to-many hashes

Perhaps this is worth dwelling on in more detail. The first such comparison was made in early 1997 in an early version of John the Ripper. It can be roughly described as follows.

If there are few hashes loaded for the current salt, then we perform a comparison or collision resolution.
If there are many hashes loaded for the current salt, then we take partial hashes for comparison with those selected (for the current salt):

a) we first obtain the necessary elements for each salt;
b) discard hashes that are not in the sparse bitmap or Bloom filter;
c) we complete the calculation of the rest and/or the hash table (ideal), and so on;
d) We perform a search in ready-made data structures, which may initially be similar to those shown above or be a simple linked list.

Amortization of password cracking costs

Here's another important topic that deserves separate consideration: how to reduce the overall cost of achieving the same results in password guessing. One possible way to do this is to reduce the overall amount of computation. Well-designed hashing schemes can amortize a very small amount of computation, but this is not the only way to reduce costs.

Along with computational complexity, there is another important metric - spatial complexity. In the real world, costs may be required for hardware, maintenance, energy, and so on. These costs are related to both computational and spatial complexity, as well as real-world constraints, which vary greatly for different attackers. It is important to consider how much CPU and RAM are occupied for what period of time, whether these resources are available or need to be purchased. After assessing these costs, it may turn out that the main problem is no longer the CPU.

Reducing the cost of password guessing on GPU (since 2008)

The pioneer in this area was Andrey Belenko of Elcomsoft. Elcomsoft produced commercial password recovery software, selling it to end users and law enforcement agencies. It focused on several target platforms. GPUs were used to crack NTLM, LM, and raw MD5 hashes, achieving speeds of 100 million hash calculations per second. At first, the speed of cracking varied greatly (at least by modern standards). Even the PS3 could achieve similar speeds. However, the numbers improved rapidly: by 2010, it was already tens of billions per second on multiprocessor machines. A real race in password cracking had begun.

In 2010, Mark Bevan's Whitepixel achieved 33.1 B/s for a single MD5 hash on a sub-$3,000 computer with four dual-GPU HD 5970s.

In 2012, OclHashcat-lite by Atom set a new speed record: 10.9 billion/s on a single dual-GPU HD 6990. OclHashcat-plus, released the same year, allowed GPU resources to be used almost as actively as CPU resources. Initially, the system was closed source, but in 2015, Hashcat became open source for MD5 hashes.

In 2010 and 2011, John the Ripper saw patches for some hashes integrated into 1.7.9-jumbo-6. bcrypt, sha512crypt were first implemented.

Finally, as of 2022, Hashcat 6.2.6 delivered 164.1 B/s for a single MD5 hash on a single RTX 4090.

Reducing the Cost of Password Cracking with Parallel Processing

Yes, parallel processing has certain limitations when it comes to secure hashing or key derivation. However, its potential for cracking passwords is virtually unlimited.

Adding parallel processing elements (CPU/SIMD, more GPU/FPGA/ASIC) can "arbitrarily" reduce the attack duration. However, a corresponding increase in memory is required (unless the hashing scheme provides memory cost amortization, and most older schemes do not use much memory anyway).

While parallel processing does not reduce the amount of computation, it does reduce the time resources are busy and amortize their costs: instead of just CPUs, the chassis uses both CPUs and GPUs.

Reducing the Cost of Password Cracking with Time-Memory Trade-Off (TMTO)

This approach is relevant when there is no salt or there are several frequently used values. Then you can try to pre-calculate and partially store the hashes in order to avoid performing the bulk of the calculations in subsequent attacks.

Here are a couple of examples of successful application of this practice:

QCrack (1995-1997) for cracking DES-based crypt(3) saved about one day per disk.
Rainbow tables also make password guessing easier (Philippe Oechslin, 2003, based on the work of Martin Heilman, 1980). However, they are now rarely used in practice. Such tables exist only for hashes without salt, and their guessing is not particularly difficult anyway. However, when resources are limited, this tool is useful.

Sometimes, it is possible to speed up the calculation of a function by using more memory. For example, in "A Fast Version of the DES and a Password Encryption Algorithm" by Matt Bishop (1987), larger or combined lookup tables (with a total size of up to 200 KB) were used. Sometimes, on the contrary, it is possible to calculate a function and, if necessary, recalculate intermediate results using less memory. Incidentally, Scrypt is susceptible to such a compromise, which is what hackers use when brute-forcing. If there are many hashes with the same salt, it is possible to increase the frequency of rejecting variants at early stages, thus reducing the cost of calculations and comparisons using larger and sparser bitmaps.

Password guessing cost metrics

We've covered the main ways to reduce password guessing costs in general. But how do we measure the results?

Given performance (checks {password, hash} per unit of time, possibly amortized), the following metrics can be identified.

Hardware: ASIC die area (mm2) for a given architecture and clock frequency, as well as for a specific ASIC process technology.
Power (W): This is not a cost per se, but long-term attacks result in power costs that correlate with die area.

Now let's look at the metrics for a specific attack ("check these password candidates against these hashes").

Equipment: product of ASIC crystal area and time (area-time, AT), sq. mm * sec.
Energy: the product of power and time, J = W * s. This metric correlates with AT, allowing relative costs to be assessed in AT only.

This is the current theoretical model underlying security architecture. Equipment and energy may have a monetary value, but are not always paid for by the attacker. The costs of real attackers may vary greatly, for example, depending on the equipment they have.

Parallelized hash function (originally memory-neutral)

Now let's move from the cost of cracking passwords to the architecture of their hashing.

This is what the original version of the parallelized hash function looked like. Multiple implementations were run on the hardware at the same time, performing parallel computations. The hashes were then compared in a many-to-many fashion. In this example, we can see that we get 32 hashes in the same amount of time it takes the defender to get one.

Parallelized hash function (amortized, with memory consumption)

This architecture uses shared memory: separate copies of data are not required. This scheme is rarely used for password hashing, but is widely used in proof-of-work functions of cryptocurrencies, in particular, Ethereum. Here, the final "score in the match" is 16 parallel generated hashes against one for the defenders in the same time.

Parallelized hash function (parallelized, memory intensive)

In this case, parallelization is implemented in such a way that small cores can be used to calculate each hash, and the memory is still shared. This is partly due to the fact that modern memory-intensive password hash functions have adjustable parallelization parameters. If they are high, then this scheme can be used for brute-force. In addition, many hash functions have a high degree of internal parallelization. What is the "result of the match" this time? 1 hash in 1/16 of the time it takes the defending side to get the same result.

Parallelized hash function (sequential, memory intensive)

Scrypt introduced the concept of sequential high memory consumption, which protects against the attacks described above. It is designed in such a way that it is impossible to use shared memory, so it is necessary to ensure that there are separate copies of the data. In the image, this is shown for a single core. But a similar architecture can be used for any number of cores that have parallel computation of a single hash built into them.

Parallelized hash function (sequential, memory intensive + ROM intensive)

Here is an unusual concept that I proposed more than ten years ago. By the way, it not only exists in theory, but is also used in practice to protect many millions of accounts. So far, we have not seen any leaks of such hashes. This is probably due to the fact that companies using the concept also care about other aspects of security.

There is a correlation: the weaker the hashes, the higher the probability of their leakage. The concept itself is as follows: you can use a lot of memory if it is shared. This allows parallelization to be used in attacks, so it is not necessary to provide separate ROMs. But there is a need to provide memory for reading and writing, which is used similarly to the previous option. Access ports and memory for reading and writing and ROM are also required. Therefore, when trying to strongly parallelize, the problem of the availability of these access ports and their bandwidth becomes acute. That is why such a scheme is considered to require a large consumption of ROM ports.

This is similar to the above-described proof of work Ethereum, but without the large RAM consumption. The attack uses cores, ROM, their ports, and so on, but then the attacker runs into the memory bandwidth limit. Moreover, in terms of protection, this scheme is at least no worse than the classic Scrypt, because it provides both large memory consumption and memory consumption with sequential access.

What is meant by the core

What's so hard to understand? In the illustrations above, a core is any processing element that can calculate a target hash without having its own large amount of memory. We provide memory to cores, sometimes making it shared. However, there may be nuances here, so it's worth "checking your watches."

A core can also mean a regular modern CPU core, a SIMD (single instruction, multiple data) unit such as in a GPU CU or SM, or a SIMD lane (in a CPU core or GPU SIMD unit). Another possibility is a pipeline stage (for example, with different hash instructions interleaved in a superscalar CPU or GPU). It can even be a single-bit number in an N-bit register of a register file (when we have inverted our problem and are now dividing it into digits). But most importantly, a core can be an FPGA or ASIC logic chain, or each pipeline stage, depending on the context.

Bitslicing

Here is another important topic that cannot be ignored in the context of hash rate optimization.

The bit-slicing technique was invented in 1997 by Eli Biham (see "A Fast New DES Implementation in Software") and implemented in about a hundred logic gates per S-box.

That same year, another researcher, Matthew Kwan, wrote a paper, "Reducing the Gate Count of Bitslice DES," expanding on these ideas. He first reduced the average number of gates needed to 87.75 (1997), and then to 51-56 (1998), depending on the types of gates available.

Other researchers continued the work. The already familiar to us Mark Bevan achieved an average gate count of 45.5 with bit sampling; Dango-Chu — 39.875, also with bit sampling; Roman Rusakov — 32.875 with bit sampling and 44.125 without it; DeepLearningJohnDoe — 23.5 with LUT3; Sovyn Y. — 22.125 with LUT3 (not yet used in cracking programs).

On AVX-512, bitslicing allows us to discard 512 computed hashes in about 9 steps.

Another interesting feature is that the many-to-many comparison is replaced by a many-to-one function, but one that is logarithmic in complexity.

Let's sum up the hash rate optimizations

As you can see from the above, modern and most flexible password cracking tools (John the Ripper and Hashcat) support hundreds of different hash types and modes on many hardware platforms. Optimizing each hash type and mode requires a lot of effort, so the degree of optimization currently varies across tools and platforms. There are ways to reuse optimizations across different hash types and/or platforms. High-level examples include Alexey Cherepanov's john-devkit and Allen Espinosa's fast-small-crypto, which are specialized code generators.

OpenCL also abstracts much of this process, so the latest version of Hashcat uses it, abandoning manual CPU optimization. Runtime tuning is also used, because when you run a downloaded password cracker on your system, its performance will not necessarily be optimal.

Both John the Ripper and Hashcat have automatic settings, but sometimes they find suboptimal settings. Usually, the user needs to start guessing passwords in a matter of seconds, and the guessing program adjusts to what it was able to find in that short time. Sometimes, however, you can achieve better results with manual settings.

Speeds for Unix hashes

Now let's "take a look at the speedometer": let's look at the speed of password cracking in different systems.

As noted, Unix changed its hash type in the 1970s, as shown in the table above. When the slow hash and 25 iterations appeared, the cracking speed dropped by two to three orders of magnitude. Then the speed gradually began to increase, largely due to the influence of bitslicing, which is also visible in the table above. For example, you can compare the difference between Pentium and Alpha.

Various improvements were subsequently methodically developed, implementations were made on FPGA and GPU. Speeds gradually increased. As you can see, today the GPU is six times faster than the latest implementations on CPU.

Speeds for quality modern hashes

Speeds for Windows hashes

LOphtCrack 1.5 on a Pentium Pro 200 checked a password file with 10 passwords using the alphabetic set (A-Z) in 26 hours. [...] [Note from mudge: try compiling the CLI version on an Ultrasparc using the Makefile compilation flags and these values will seem horribly slow" (1997).

Obviously, the situation with Windows hashes is not so good. Yes, the speed of cracking NTLM hashes has grown to hundreds of billions per second, which is even higher than for LM hashes. However, the former at least does not have a limitation on the hash length, as LM has.

Of course, the speed of two terahashes per second is a reason to abandon such hashes. In fact, they are still used in Windows, although there is pentesting of user hash databases and security auditing. And this seems extremely unreasonable. With pentesting, you can crack one hash, with a security audit - 90%. Perhaps, with the use of thoughtful password policies, this figure can be reduced to 50% or even 10%. But it is still very high, unless, of course, you use generated passwords, which can help attackers bypass the problem.

Speeds for modern strong hashes

In this regard, we are interested in Argon2 - a modern function for generating keys, which also has a high degree of parallelism.

When computing 480 Argon2 hashes in parallel (that's how many fit in the GPU memory) using John the Ripper on a GTX 1080 (max turbo, 180W, 2016 gaming GPU), we got the following results:

0:00:00:28 0.03457g/s 1742p/s 1742c/s 1742C/S Dev#4:53C thrillerl..diamond

Thanks to bitslicing, we were able to use 50,000 "cores". Of course, a GPU does not have that many cores, so we achieved excess parallelism. Since the pipeline stages also need to be filled, we compute about five times as many instances of the Argon2 component with parallelization capability as the compute hardware has.

For comparison, on 2x Xeon E5-2670 + 8x DDR3-1600 (32 threads, total 230 W, 2012 server CPUs) the results were as follows:

0:00:01:54 0.0O8726g/s 437.8p/s 437.8c/s 437.8C/S 050489..010591

Let's sum up the speeds

We looked at how to optimize speed:

hashing;
comparisons with loaded hashes.

However, so far we have hardly touched on optimizing the speed of generating password candidates and avoiding or suppressing duplicate password candidates.

For modern strong functions, the speeds per device have not changed much since the 1970s and are generally in the order of 1000 checks per second. At such low speeds, other optimization tasks become more important, such as:

formation of target dictionaries;
uniqueness;
ease of analysis;
feedback;
processing structure;
focus of attack.

Perhaps some of these tasks are worth discussing separately.

Attack Focus on Passwords

In any endeavor, it is important to stay focused – password attacks are no exception. Among other things, password candidate generators help attackers with this.

Password Candidate Generators (1980s)

Back in 1983, I wrote a simple and imperfect program to check weak passwords. Unfortunately, I was too quick to give a colleague at the University of California at Berkeley a copy, which he added to the school's security suite. Someone later distributed it, and the program became publicly available.

Password candidate generators (early 1990s)

In 1991, Alec Muffett's Crack program was released.

The first pass Crack makes with data taken from the user password field. In my test file, this pass can get about 4% of passwords (out of the total estimated 15%). In addition, this pass is very fast because it works with a small amount of data that is very often used as passwords.
The first run of the second pass, consisting of lowercase dictionary entries, yields another ~5% of passwords. The duration of the first run depends on the number of CPUs and dictionaries supported, but using Ultrix's /usr/dict/words and my bad_pws.dat on four CPUs, it doesn't take more than a few hours.
In subsequent runs, the percentage of hacked passwords gradually decreases: 2%, 1%, 0.5%... But they belong to people with rather exotic passwords, so it is logical that it will take some time to hack.

Alec Muffett's Crack User Guide.

Password Candidate Generators (Mid 1990s)

In the mid-1990s, Crack improved significantly, making it easier to focus attacks on passwords. Again, let's look at Alec Muffett's Crack user guide:

Crack 5.0 supports the concept of dictionary groups - matching words taken from a sample of raw text dictionaries (with one word per line). This allows the user to group dictionaries into "most likely", "less likely" and "least likely" to generate successful guesses...
When Crack is launched, two more dictionary groups are created: "gecos" and "gcperm". The "gecos" group contains only words directly extracted from the information stored in the password file. The "gcperm" group contains words mechanically created by permutations and combinations of parts of words contained in the password file (for example, "Alec Muffett" becomes "AMuffett", "AlecM", and so on)...
During the attack, the hacker sequentially uses modification rules and applies them to a group of dictionaries.

Examples of modification rules

Modification rules are micro-instructions, one per line, that define patterns and actions to apply to words from the dictionary to generate a sequence of password candidates.

Here's an example of such a rule: /ese3u. This selects words containing the letter "e", replaces it with the number "3", and converts the rest of the word to uppercase (you can also use /e se3 u).

This system appeared in Crack 4.0 (November 1991) and was supported until Crack 5.0 (December 1996). It was used and extended in John the Ripper (1996 onwards), in the InsidePro tools, in Hashcat.

Evolution of modification rules

In John the Ripper: preprocessor, rule-opt-out flags (e.g. skip a rule if the hash is case-insensitive), word-pair commands (used only for first and last names, example: johnsmith -> John Smith, John_Smith, John-Smith -pc 1 <- (?ac $[ _\-] 2 (?ac), other additional commands.
The same and additional commands in InsidePro PasswordsPro and Hash Manager.
The same and additional commands in Hashcat (starting with PasswordsPro).
"The world's first and only in-kernel rules engine" Hashcat (OpenCL and CUDA).
Hashcat compatibility mode in John the Ripper.

Modification Rule Sets

The old John the Ripper default rules were hand-written, some of which were customizable, others not. In turn, KoreLogic rules (2010) are aimed at users dealing with password policies. KoreLogic already allows the use of a preprocessor (a much shorter set, expandable to about 7 million rules).

Other rules - InsidePro PasswordsPro - were imported by the Hashcat user community. Over the years, the community has created many new sets of rules (both hand-written and generated). The Hashcat best64 competition has identified the 64 most effective rules that allow cracking passwords with the highest probability.

OneRuleToRuleThemAll — the best 25% of all Hashcat rules (about 52 thousand rules).
OneRuleToRuleThemStill is an even more optimized set (about 49 thousand rules).

Later, there were passphrase rules for Hashcat and John the Ripper (some of which require special input), and a John the Ripper "All" ruleset (using nested include directives) of about 11 million.

Setting up modification rule sets

Back in 2012, researcher Simon Maréchal worked on an automatic generation of modification rules (Rulesfinder) specifically for John the Ripper, but his work was not integrated into the project. The original implementation of Rulesfinder was abandoned, and the project was revived only in 2020.

In 2022, John the Ripper added a feature called PerRuleStats, which was used to change about 7.6 thousand lines of preprocessor output from some hand-written rules to be used by default.

This "reduced the weights that determine the number of guesses and each rule for each password candidate tested." This ordering works well for fast and medium-speed hashes. It was generated based on the output of pwned-passwords-sampler for HIBP v8 (100 million non-unique, ~54 million unique hashes), taking into account unique guesses.

The rules were manually divided into the best (40%), average (next 40%) and worst (last 20%). Accordingly, the first ones provide the bulk of hacks (more than 97%), the second ones - just over 2%, and the third ones do not produce any results.

Creating target dictionaries

The key point here is that each hash has its own dictionary, based on information about a specific user.

Duplicate password candidates

Now let's talk about a problem that often arises due to the presence of similar login passwords. Consider the following lines from the list of common passwords, used as a word list:

password
password1
Password

Using the modification rule "l $1" (convert to lowercase and add the digit 1) we get the following:

password1
password11
password1

As you can see, there is a duplicate in the list. Also, "password1" is a duplicate from the original word list.

Eliminate duplicate candidate passwords

How can we combat the problem described above? Crack's rules supported many "reject word if/unless..." commands that could be used to avoid generating duplicate words. John the Ripper added more word rejection commands, as well as rule rejection flags. These commands are used extensively in the standard ruleset, which is a non-trivial task. If the input wordlists are limited to lowercase only, then it is possible to effectively combat duplicates. Everything works out especially well if you pass a list of common passwords as input. For example, with 89.6% unique password candidates for the standard wordlist and about 3 thousand rules, the password candidate duplicate suppression algorithm using 256 MB of RAM improves this figure to 94.3%, creating an output stream of about 50 GB.

Hashcat rulesets typically use little or no word suppression, which results in many more duplicates (for example, with 59.7% unique candidates for the same wordlist with the top 3k rules from OneRuleToRuleThemStill in John the Ripper compatibility mode, a 256MB RAM duplicate suppression algorithm improves this to 80.6%.

Suppressing duplicate password candidates

To paraphrase a famous maxim, if you can't eliminate it, suppress it. John the Ripper's "single crack" feature uses small ring buffers for each salt (as well as hash tables for fast lookups), identifying and suppressing recently encountered candidates.

Since 2018, Hashcat has supported “brain” functionality for analyzing password candidates. Here are some of its benefits.

The client-server network architecture allows for the suppression of duplicates between tasks.
This functionality requires only simple manual configuration even for local use with a single task.
According to the documentation, the server has a performance limit of approximately 50 thousand passwords per second.
The functionality writes fast hashes of password candidates to disk without limitation or eviction.

In 2022, a separate suppressor of duplicate password candidates John the Ripper appeared. Let's list its main features.

Currently works with processes individually - does not suppress duplicates between processes.
Enabled by default when using modification rules. Automatically disabled at low hit rates and high other processing speeds. Memory usage can be customized.
Speed of several million passwords per second, but consistent and synchronized with the rest.
It has an opportunistic (probabilistic) filter of a given size and has displacement.

The John the Ripper password candidate suppressor uses a hash table to store fingerprints (other fast hashes) of elements. It is similar to a cuckoo filter, but has only one potential bucket for each element instead of two. Buckets are currently 8 elements wide. When a bucket is full, we simply evict/replace the fingerprint (from the other half).

Tools for removing duplicates from word lists

The purpose of such tools is to remove duplicate lines without changing or optimizing their order. That is, duplicate lines are not sorted, and no memory or disk space is required for the entire output, except for this output.

Since version 1.6 (1998), John the Ripper has a built-in uniqueness feature that eliminates duplicates completely in a predictable manner. When the output exceeds the memory size, the feature re-reads the already written output file to detect duplicates between what is in memory and what is in the output.

Other tools that can be mentioned include:

Duplicut (2014) - It uses multiple threads only when the file is large enough to be split into blocks.
Hashcat-utils rli (2015), written in C and running in single-threaded mode regardless of file size.
Rling - A faster, more feature-rich, multi-threaded alternative to rli (2020) from Waffle (the creator of the MDXfind password guessing program).

Password dictionaries

In the past, crackers simply used lists of dictionary words to guess passwords. Small public lists of common passwords appeared as early as the 1980s: remember those used by the Morris worm. A longer, ordered list was maintained by John the Ripper (1997 onwards).

The RockYou leak (2009, 32.6 million unencrypted passwords, 14.3 million unique) changed the situation significantly: a large sample of passwords appeared, less susceptible to hacking. This sample became the standard for training, testing and using password security tools.

Vocabulary wordlists are still used (e.g. the 2003 Openwall wordlist collection, which aggregates many sources). Additional published wordlists are scraped, for example, from Wikipedia (Sebastian Ravo, 2009, 2012; Project Gutenberg books (CrackStation, 2010).

Hashcracker communities

Community forums such as InsidePro, hashkiller.co.uk, hashes.org (almost all of which are now down) have collected a ton of crackable hashes and unprotected passwords, usually without citing sources or usernames.

Files from these forums have also been saved by the defensive security community Have I Been Pwned (HIBP), or Pwned Passwords. Hundreds of millions of real passwords previously found in data breaches have been published by Troy Hunt as SHA-1 and NTLM hashes (rehashed from cleartext) along with the number of occurrences in the breaches.

The passwdqc tool can proactively check new passwords against HIBP offline. Unencrypted dictionaries from hashes.org can still be found elsewhere. It is possible to bruteforce almost all passwords against HIBP and generate an ordered list of leaked passwords.

Optimizing Dictionaries

HIBP v8 is huge: it contains 847 million unique passwords from several billion accounts. Although it is difficult to say whether this is the largest collection. As for RockYou, it is smaller in size, but cleaner. Both collections are used by security researchers.

The current password.1st from John the Ripper (2022, generated at the time to configure the modification ruleset) is based on Pwned Passwords v8 (HIBP) with over a hundred intersections with RockYou. The tool is filtered in such a way that it requires more than 97 intersections on top of RockYou. These criteria are chosen so that a password used by only one person many times is very unlikely to be included in the list.

The focused list of common passwords contains approximately 1.8 million lines and weighs approximately 15 MB.

It is obvious that it is entirely ethical to distribute all these dictionaries, and they are highly effective.

Probabilistic Password Candidate Generators (mid 1990s)

Here is another approach that makes password guessing easier: "A technique for generating password candidates from a statistical model" (Simon Maréchal, 2012).

In 1995, a new algorithm was developed for exhaustive duplicate-free search in the key space as one moves upward along the 2D Charset ** Length surface.

John the Ripper 1.0 introduced incremental mode (1996).

[Incremental:Alpha]
CharCount = 26
MinLen = 1
MaxLen = 8
CharsetB = smcbtdpajrhfIgkwneiovyzuqx
CharsetM = eaiornltsuchmdgpkbyvwfzxjq
CharsetE = erynsatldoghikmcwpfubzjxvq

Probabilistic Password Candidate Generators (second half of the 1990s)

John the Ripper 1.0 can be retroactively called a zero-order Markov chain.

Additional zero-order modifications were added for each length and position statistic.

Charset11 = ajyfioqxdehmnrstlcupbgkwvz
Charset21 = mdjpagetbrnsckyfilwhuoqvzx
Charset22 = olstabegrkjdhnvwcmpfiquxyz
Charset31 = dacjmbrtpslknfeghowqvzxiuy
Charset32 = aoeisumctgdblrfjpnvhwkxyzq
Charset33 = msnctdxepghlywabrjikuzofvq

In late 1996, developer The SOrCErEr's Star Cracker went beyond the zero order.

At the same time, a second order of John the Ripper appeared for each length and position.

Probabilistic Password Candidate Generators (late 1990s)

In late 1996, John the Ripper introduced training on previously cracked passwords (reading john.pot).

-makechars:<file> make a charset, <file> will be overwritten

In 1998, John the Ripper 1.5 improved the traversal surface from 2D to 3D. The three components were the current number of virtual symbols, the length, and the position of a fixed virtual symbol. The third component was internal to the earlier algorithm, but was now made explicit. By "virtual symbol" is meant that it indexes the sorted string, where up to two previous symbols can vary (second-order Markov chain). The increased granularity is then useful for parallel or distributed processing.

Probabilistic Password Candidate Generators (mid-2000s)

In 2005, Arvind Narayanan and Vitaly Shmatikov published an article titled “Fast Dictionary Attacks on Passwords Using Time-Space Tradeoff”. Let’s give the floor to the researchers.

"Our first observation was that the distribution of letters in easy-to-remember passwords is likely to be similar to the distribution of letters in the users' native language. By applying standard Markov modeling techniques from natural language processing, this can be used to significantly reduce the size of the password search space. Our second contribution is an algorithm for efficiently enumerating the remaining password space. It enables time-space tradeoff techniques by limiting memory access to a relatively small table of "partial dictionary" sizes and allowing very fast dictionary attacks…
We noticed similarities between the ideas used in this algorithm and the well-known Viterbi algorithm from the field of voice processing."

Probabilistic Password Candidate Generators (early 2010s)

In 2007, another researcher, Simon Marechal, applied Narayanan and Shmatikov's work to classical password cracking.

John the Ripper introduces "Markov mode", added by Simon Maréchal and extended by Frank Dittrich and magnum (2007-2012 and onwards). It is a first-order Markov chain without length or position separation. It outperforms incremental mode in some tests, but requires complex selection of attack durations (specifying minimum and maximum password strengths). "Markov mode" supports parallel and distributed processing (by limiting the node subinterval). Simon Maréchal's 2012 paper "Probabilistic password generators" compares many probabilistic models (including second-order variations).

Probabilistic Password Candidate Generators (2010s)

Later, further scientific research was carried out, in particular, the article by Matt Weir and his colleagues "Password Cracking Using Probabilistic Context-Free Grammars" (2009) or "Pretty Cool Fuzzy Guesser (PCFG)". Here is a quote from it.

"Here is a new technique that generates password structures in order of highest probability. First, we automatically create a probabilistic context-free grammar based on a training set of previously discovered passwords. This grammar then allows us to generate word modification rules, and from these, predicted passwords."

In 2013, Marcus Dermuth and colleagues published an article entitled "OMEN: Faster Password Guessing Using an Ordered Markov Enumerator".

"In the work by Narayanan et al., the indexing is not done in order of decreasing probability. We build a list of passwords with approximately decreasing probability. At a high level, our algorithm discretizes the probabilities into several buckets and iterates through all these buckets in order of decreasing probability (third order)."

Probabilistic Passphrase Candidate Generators (2010s and beyond)

Probabilistic password candidate generators can also generate phrases if trained on such inputs (or simply on a mix of real-world passwords/phrases). PCFGs perform better than character-based Markov chains.

"Phraser is a phrase generator that uses n-grams and Markov chains to generate phrases for the purpose of cracking passphrases"; written in C# for Windows (2015).

"RePhraser is a Python rewrite of Phraser using Markov chains for linguistically correct password cracking" (2020). Also includes hand-crafted and generated rulesets.

Probabilistic Password Candidate Generation Using Neural Networks (2010s and beyond)

In 2016, William Melicher and colleagues published a paper titled “Fast, Lean, and Accurate: Modeling Password Guessability Using Neural Networks.” It examined a recurrent neural network (RNN) that predicts the next character without duplicates. The 60 MB model outperformed other generators, but was still too slow to go beyond 10 million candidates, so it was used only in simulation. The 3 MB model performed nearly as well, spending about 100 ms per password in JavaScript.

Generative Adversarial Networks (GANs) produce duplicates (around 50% for 1 billion passwords). Here are some interesting papers on the topic:

“PassGAN: A Deep Learning Approach for Password Guessing” (2017).
“Improving Password Guessing through Representation Learning” (2019).
David Bisner's "Generative Deep Learning Techniques for Password Generation" (2020); uses VAE, WAE, GPT2 with fine-tuning. Probably the best one to date.
"GNPassGAN: Improved Generative Adversarial Networks For Trawling Offline Password Guessing". "Generates 88.03% more passwords and generates 31.69% fewer duplicates" than PassGAN, whose performance has already been surpassed (2022).

Vowel/Consonant Patterns

John the Ripper versions 1.0 through 1.4 had a Wordlike option that, when set to Y, enabled a built-in word filter (filtering out words with two or more vowels in a row, or more than two consonants). This was necessary along with the 1.0 zero-order Markov chain, which was still an option in 1.4 to save memory (second order required 4 MB of RAM), and to allow manual character sets for each position (similar to the later "mask mode").

John the Ripper 1.5 no longer had a zero order, so all of that was abandoned.

Proprietary password candidate generators (late 1990s)

In late 1996 - early 1997, John the Ripper 1.3+ introduced an "external mode" that allowed writing custom password candidate generators or filters in a C-like language in a configuration file. This code was compiled into a stack VM implemented as stream code. The following optimizations were used: top-of-stack cache, multi-push, GCC's "Labels as Values". There was also an optional JIT for 32-bit x86 (1997), which was abandoned in version 1.5 (1998).

Custom Password Candidate Generators (2000s and onwards)

Attackers who lacked probabilistic password candidate generators instead added features that allowed less intelligent exhaustive searches to be focused on suitable subspaces.

InsidePro now has a mask syntax, such as ?u?l?l?l20?d?d. It is based on using the character class entry from the Crack program's word rejection rules and listing all the strings that match the mask (for example, from Aaaa2000 to Zzzz2099).

Hashcat also used this syntax, later extending the implementation to use Markov chains.

Later, John the Ripper also started using this syntax as a mask mode, extending it with constructs similar to the rules preprocessor, for example, ?u?l?l?l20[0-2]?d

Hybrid modes add a mask on top of a smarter, slower generator (2010s):

Hashcat modes add a mask (on the device) to the beginning or end of the word list (from the host);
John the Ripper mask (on device ) ?w denotes the "word" of another mode (host).

Candidate passphrase generators (mostly 2010s)

But in this area, the situation is, unfortunately, far from ideal. However, there are some useful developments and tools here. Let's list some of them.

Dictionary rules that add specified words to the beginning or end of a passphrase.
Simple Perl scripts for combining words, published in 2006.
Hashcat Combinator mode (2 words from two lists, not probabilistic). The combination mode in Hashcat allows you to generate passwords by combining two words from different dictionaries. In this case, fixed combinations are used without taking into account probabilistic algorithms. This means that passwords are created using a simple principle of gluing words, which can be useful for brute-force attacks to crack passwords.
PRINCE (PRobability INfinite Chained Elements) is a password generation method developed by Atom in 2014. It is based on the use of probabilistic models to create sequences of words that can be more effective in attacking passwords. This approach takes into account the frequency of words and their combinations, which increases the chances of a successful guess compared to simple brute force.
Features for creating passphrases using modification rules that apply to phrases rather than dictionaries (introduced in 2019).
Lists of passphrases that were not released (appeared in 2021). For example, all extracted 2-6 word sequences from Project Gutenberg books, sorted from most common to least common.

Password Candidate Generator Combinations

Different generators produce both unique and partially overlapping passwords. It is desirable to use different generators and suppress overlapping duplicates. However, in practice, multiple generators are most often used, but without suppression. The standard John the Ripper call suppresses duplicates between the wordlist and the incremental mode passes. The Hashcat "brain" can do the same. Another suppression method is to exhaustively try simple patterns and then exclude them and the complex runs.

Generators can be optimized for use with some others. For example, probabilistic generators are usually used after several runs with wordlists. Training a probabilistic generator to work best can often lead to overtraining and cracking fewer passwords outside the wordlist. This is often not taken into account in comparisons, which is incorrect.

The process of password selection

The best results are achieved by using multiple approaches in multiple stages, along with using different candidate password generators. Subsequent stages should build on the guessing progress already made. The guessed passwords should be used as a wordlist for retraining.

There are tools for process automation and task management. Their functionality partially overlaps with distributed processing and team coordination. Such tools are used, for example, in competitions and the work of "red teams". Based on the results of competitions, teams write articles, in particular, on the Crack Me If You Can competition, held since 2010. In such articles, you can find useful information about organizing the work process.

The Future of Password Cracking

Speeds

Obviously, the computing power of CPUs, GPUs, and FPGAs will increase in the future, which will affect the hashing speed. Possible ways to counter this are to adjust the generation of hashes and keys, add more iterations to the algorithms, and use more memory. Hopefully, this will limit the verification speed to the same thousand per second on similarly priced machines.

Also in the future, we can expect the development and increase in the size of publicly available ASICs and FPGAs with open architecture (or with both technologies on a chip). One option is to use a tiled architecture, which allows one bitstream to be used for different FPGA sizes (this technology does not yet exist for commercial FPGAs, but this will probably change in the foreseeable future). We can also expect an increase in the number of tiles, which will allow more cores to work simultaneously.

Among the smaller changes, we can expect further optimizations, for example, bitslicing of Lotus/Domino hashes is coming soon.

Optimal search order

In this area, we should expect improved support for password phrases (tools, datasets) and arbitrary tokenization, further development of neural networks, and a solution to the problem of duplicates of generative neural networks. Pre-generated and filtered data will also be published. In the future, scraping and training of neural networks on user data will be performed.

Features

Let's highlight several of the most promising features in the field of password selection. First of all, this is support for the yescrypt ROM. We must not forget about the use of unified NVIDIA memory, which simplifies distributed processing (easier setup, dynamic reassignment, security). At the same time, specialized auxiliary UI or LLM will appear in the future, which will simplify work with programs, including for inexperienced end users.

Instead of conclusions

Password guessing only seems like a simple task at first glance. It has a low entry threshold and a smooth learning curve, but creating your own password guessing tools requires serious knowledge of computer science and development with non-trivial social aspects. This is still an actively developing and highly competitive field that welcomes new participants. Efficiency and optimization of work are extremely important here, victories in specialized competitions, success in choosing passphrases for crypto wallets, etc. are becoming increasingly important.

As in any other area of offensive information security, new methods and results constantly appear that influence the architecture and parameters of the next protective means. For example, Yescrypt was designed based on the technologies and methods implemented in John the Ripper.

Source

The Past, Present, and Future of Password Cracking

Man

Professional

Why use and hack passwords?​

Password guessing scenarios​

Attack targets​

Optimization of selection​

History of password authentication​

1. Storing passwords in clear text (TENEX, Unix, 1960s - early 1970s, CTSS)​

2. Password Hashing (Multics, early 1970s)​

3. Password Hashing (Unix, mid-1970s)​

4. Salt and other modifications (Unix, late 1970s)​

5. BSDI, bcrypt, PBKDF2 (1990s)​

6. Scrypt, Argon2 (2010s)​

Other authentication methods​

Password guessing speed​

1980s: Unoptimized Methods for Generating Hashes​

1980s: Partially optimized methods in the absence of salt​

Early 1990s: Partially optimized methods using salt​

Late 1990s: highly optimized methods using salt​

Comparing many-to-many hashes​

Amortization of password cracking costs​

Reducing the cost of password guessing on GPU (since 2008)​

Reducing the Cost of Password Cracking with Parallel Processing​

Reducing the Cost of Password Cracking with Time-Memory Trade-Off (TMTO)​

Password guessing cost metrics​

Parallelized hash function (originally memory-neutral)​

Parallelized hash function (amortized, with memory consumption)​

Parallelized hash function (parallelized, memory intensive)​

Parallelized hash function (sequential, memory intensive)​

Parallelized hash function (sequential, memory intensive + ROM intensive)​

What is meant by the core​

Bitslicing​

Let's sum up the hash rate optimizations​

Speeds for Unix hashes​

Speeds for Windows hashes​

Speeds for modern strong hashes​

Let's sum up the speeds​

Attack Focus on Passwords​

Password Candidate Generators (1980s)​

Password candidate generators (early 1990s)​

Password Candidate Generators (Mid 1990s)​

Examples of modification rules​

Evolution of modification rules​

Modification Rule Sets​

Setting up modification rule sets​

Creating target dictionaries​

Duplicate password candidates​

Eliminate duplicate candidate passwords​

Suppressing duplicate password candidates​

Tools for removing duplicates from word lists​

Password dictionaries​

Hashcracker communities​

Optimizing Dictionaries​

Probabilistic Password Candidate Generators (mid 1990s)​

Probabilistic Password Candidate Generators (second half of the 1990s)​

Probabilistic Password Candidate Generators (late 1990s)​

Probabilistic Password Candidate Generators (mid-2000s)​

Probabilistic Password Candidate Generators (early 2010s)​

Probabilistic Password Candidate Generators (2010s)​

Probabilistic Passphrase Candidate Generators (2010s and beyond)​

Probabilistic Password Candidate Generation Using Neural Networks (2010s and beyond)​

Vowel/Consonant Patterns​

Proprietary password candidate generators (late 1990s)​

Custom Password Candidate Generators (2000s and onwards)​

Candidate passphrase generators (mostly 2010s)​

Password Candidate Generator Combinations​

The process of password selection​

The Future of Password Cracking​

Speeds​

Features​

Instead of conclusions​

Similar threads