Man
Professional
- Messages
- 3,077
- Reaction score
- 614
- Points
- 113
Almost a year ago I wrote an article about post-quantum cryptography and in this article I decided to continue this topic, since every year the moment approaches when cryptography as we know it today will go under. With the advent of quantum computing, interest in asymmetric encryption technologies has increased significantly. In particular, today they are trying to find a suitable replacement for RSA and algorithms based on discrete logarithms, which will be trivially easy to crack using Shor's algorithm. Although symmetric encryption is more resistant to quantum computing, with its widespread use it will be significantly compromised by implementations of Grover's algorithm, since it can perform unstructured search in:
Thus, the size of the symmetric key will be reduced by half, i.e. the efficiency of the AES-256 key will be reduced to 128 bits. Then the question automatically arises, will AES-256 remain secure? Let's figure it out.
Given current physical and economic computational limitations, it is commonly assumed that a 128-bit key provides sufficient security to resist brute-force attacks, regardless of current advances in computing technology. The arguments of those who support this assumption are largely based on the Landauer principle, which states that there is a minimum limit on the amount of energy required to erase a single bit of information. Since its formulation, the Landauer limit has been the subject of much study over the past half century. Recent results suggest that there is indeed no lower limit on the energy required for computation. This insight has motivated research into zero-power computing and implies that the physics-based arguments for the absolute security of 128-bit keys need to be reconsidered.
It appears that due to its nature, AES-256 is fundamentally not as secure as AES-128. In particular, there are no known attacks on AES-128 faster than exhaustive search. AES-256 is also known to have some vulnerabilities, although they are not currently exploitable in practical terms. It is therefore reasonable to assume that under a quantum attack, the maximum effective security of AES-256 would be 128 bits, and perhaps significantly less. As Bruce Schneier likes to say, “The attacks always get better and never get worse.”
Jordi Rose, co-founder of quantum computer company D-Wave Systems, has noted that the number of qubits per processor his company designs is roughly doubling every year. It may be too early to tell whether this claim is truly a quantum extension of Moore’s Law. However, if it is true, the implications of this type of exponential growth will lead to a massive increase in parallel quantum computing.
The confluence of these factors could mean that, for example, the key space for AES-256 could be cryptanalytically reduced and then processed by the quantum equivalent of an array of massively parallel processors. In this light, it might be possible to suggest that AES as an encryption standard is slowly approaching its logical conclusion. In any case, many researchers are working hard to find replacements.
What should be the key characteristics of an encryption algorithm to replace AES?
First of all, it should be fast, i.e. built on simple operations that require a relatively small number of processor cycles. No less important is the scalability of the algorithm, so that systems based on it can work effectively in an arbitrary range of word sizes and adapt to various architectures, both modern and future. Of course, the most important characteristic of the algorithm should be its cryptographic strength and resistance to known attacks. It would be nice to ensure its relative simplicity of understanding and implementation in order to transparently replace AES. And finally, given the zoo of modern devices, hardware independence is necessary to achieve high performance on various types of devices without any crutches such as a hardware instruction set of a specific architecture.
Given these goals, one prominent concept is to use at least three relatively simple and separate nonlinear processes that could interact in a way that obfuscates their individual contributions. Moreover, each component would have its own secret independent key.
The pioneering work of Daniel Bernstein of the University of Illinois on the Salsa20 family of stream ciphers embodies this minimalist approach to designing ciphers from scratch. The failure of cryptanalysis and Google’s subsequent adoption of his ChaCha20 variant in 2013 support Bernstein’s argument about the effectiveness of carefully executed simple instructions.
Let's consider one of such developments. The authors of the cipher are Harlan Brothers, Luke Lonergan and Robert Schmicker. They called their algorithm Nightingale.
The cipher components are limited to the following cryptographic primitives:
Bernstein recommends avoiding the use of S-boxes, however in this algorithm the S-box does not dynamically interact with the key, and so the structure is not susceptible to the attack he successfully implemented on AES.
So, let's move on to the description of the algorithm where:
Then we have:
The figures below show flow charts of the encryption and decryption processes. This version of the algorithm uses 64-bit blocks and 896 bits of key material, although it can be implemented with 512 bits. Other configurations can reduce the overall key length to 384 bits.
Figure 1. Flowchart of the Nightingale encryption process
Figure 2. Flow chart of the Nightingale decryption process
Since the algorithm processes 64-bit plaintext blocks, messages whose size is not a multiple of 64 bits need to be padded. We will show below how the algorithm was integrated into the OpenSSL library. OpenSSL EVP adds padding by default using the IETF Cryptographic Message Syntax Specification.
There are two main sources of nonlinearity in the algorithm: a high-quality pseudorandom number generator (PRNG) and a session-randomized S-box. Let's look at them in more detail.
PRNG (128 bit)
The keystream for the algorithm is provided by a permutation congruential generator (PCG). The PCG family of generators, developed by Melissa O'Neil, performs two functions: a state transition function, which is a linear congruential generator (LCG), and an output function, which applies a nonlinear permutation to the internal state.
The output function is built on the idea of tuple permutation functions. These permutations exploit the existing randomness in the most significant bits of the current state to improve statistical qualities while maintaining uniformity. This approach effectively eliminates the well-known shortcomings of LCGs and outputs a stream that is statistically difficult to predict.
For this version of Nightingale, an LCG was chosen that uses a 128-bit seed, 128 bits of internal state, and produces a 64-bit output with a period of 2 128. The seed is derived from the hash of the shared session key. To avoid predictability, its output function includes an XOR operation on the internal state. It outperforms some well-established cryptographically strong pseudorandom number generators (CSPRNGs) in speed and statistical performance. Unlike some of them, this generator passes the TestU01 BigCrush statistical test suite, but it should be emphasized that it is not currently considered a CSPRNG, primarily because it has not yet undergone rigorous peer review. However, at a minimum, its output is difficult to predict, and since it is used in a non-standard way, conventional attacks on PRNGs are not possible.
For example, a standard assumption is that an attacker can obtain the plaintext corresponding to at least part of the ciphertext. In the context of a typical cipher, it would then be trivial to obtain part of the keystream and thereby attempt to learn the internal state of the generator and, if possible, the initialization vector. This could successfully compromise the security of the system. Looking at the encryption scheme, we see that in the case of Nightingale, the plaintext is substantially separated from the keystream, so an attacker would not be able to exploit partial or even complete knowledge of the plaintext in this way.
S-block (512 bits)
The S-box in Nightingale is a fast and simple table that maps each input byte to an output byte using a random permutation of numbers from 0 to 255. It is generated uniquely during each session. Unlike the static S-box in AES, which interacts with the key via a state matrix and is carefully designed to minimize correlation between input and output bits, this S-box is ephemeral, indeterminate, unmixed with the encryption key, and is not the only source of nonlinearity in the cipher.
At the start of each session, records are obtained using four 128-bit seeds derived from the shared key exchange hash as input to the LCG. The four seeds are used to create a matrix G of four independent streams g(1), g(2), g(3), and g(4), each consisting of forty 64-bit words:
Transpose:
and glue them together into a line:
{g(1) 1, g(2) 1, g(3) 1, g(4) 1, g (1) 2, g(2) 2, g(3) 2, g(4) 2, . . . , g(1) 40, g(2) 40, g(3) 40, g(4) 40 }.
Finally, this sequence of 64-bit words is divided into 256 40-bit values, which are used to shuffle the numbers from 0 to 255. The resulting randomized sequence of values constitutes the elements of the S-box. For an index i = {0, 1, 2, . . ., 255} and a unique random 8-bit value s i, we can define the S-box function as:
The decryption process uses the inverse S-box. To get it, we simply swap the S-box assignments in the equation above so that for the inverse function S −1 :
In this algorithm, there are approximately 10,507 possible permutations of the S-box. At a minimum, a 1684-bit key would be needed to theoretically cover this huge space of permutations. We have an arbitrarily assigned 512-bit key to cover a completely unrealistic, albeit small, part of this space (≈ 10,154 permutations). The S-box can be lengthened or shortened depending on the specific application of the algorithm.
Due to the nonlinear interaction of several components, for any session, having an identical input and output byte is not a vulnerable feature. Similarly, over the entire permutation space, the probability of finding a pair of consecutive values in a given session (≈ 1 − e -1 ) far outweighs the small probability of duplicate mixing values (≈ 10 -12 ). It is impossible to know exactly what part of the permutation space we collect in any session. Even if we assume that the subset of possible permutations is a representative sample of the entire space, the S-box is relatively isolated from both the input and output of the cipher. It is also XORed with different keys in each block. Therefore, at best, extracting useful information about the contents of a uniquely generated S-box seems impractical.
Mask M (128 bit)
Based on the first scheme and the first formula, the mask is intended to provide preliminary protection against a chosen-plaintext attack. The 128 bits obtained from the hash of the shared session key are used to fill the LCG, which then generates a 64-bit mask. This ensures that the attacker's chosen plaintext is not passed to the S-box. For example, the string "aaaaaaaa" can be transformed into any combination of 8 different characters. In the case of repeated plaintext blocks, it is guaranteed that this plaintext is transformed differently for each block b. Therefore, starting from b 2, the output S i-1 of the S-box b i-1 is XORed with the mask and the plaintext.
Support Vector RV (128 bits)
Designed to add a layer of nonlinearity to the ciphertext by obfuscating the relationship between the keystream and the S-box output. 128 bits, also derived from the hash of the shared session key, are used to pad the LCG, which then generates the 64-bit RV. To complicate the interaction of the RV with the keystream and the S-box output, the RV undergoes a nonlinear bitwise right rotation. The rotation distance is determined by the most significant six bits of the block's keystream. This rotation is cumulative, so that for the rotation distance rot and the right rotation function R rot, we obtain:
Assuming no rotational symmetry in RV, this means that when combined with the S-box output, the keystream for each block is mixed with one of 64 different random values. This further complicates any attempt to break the internal state of the LCG function. Finally, for the first block of each session, RV replaces the S-box output of the non-existent previous block.
In its current form, Nightingale has been integrated into the OpenSSL library (version 1.1.1-dev) by inserting the implementation as a crypto module alongside others available in the library (e.g. RC4, AES, DES, etc.) to make deployment of this new cipher as simple as possible. Additionally, Nightingale is integrated into OpenSSL's high-level EVP interface, allowing the user to easily change the cipher used in their program by simply selecting a different crypto module.
To ensure consistency, tests were conducted using the built-in OpenSSL speed test. The results are presented in the table below. All tests were performed on a test bench using an Intel Core i7 6800k processor with 16 GB of memory running Ubuntu 16.10. The results are reported in thousands of bytes processed per second.
Timing tests show that on buffers larger than 64 bytes, Nightingale is faster than all versions of AES that use hardware acceleration without parallelization. For smaller buffers, hardware-accelerated AES appears to have an advantage due to lower startup costs. When using caching and other memory optimizations, Nightingale outperforms hardware-accelerated AES by about 15% for large buffers. Compared to software implementations of AES-128 and AES-256, Nightingale is 5.8 and 8.6 times faster, respectively, for large buffers.
Since Nightingale is a relatively new algorithm, it has yet to undergo more rigorous peer review. However, it has already passed the TestU01 statistical test suite, showing that its output is indistinguishable from random. The BigCrush test results for Nightingale can be found here. These results imply that the ciphertext is indistinguishable from a chosen-plaintext attack, which in turn is equivalent to semantic security. However, the described version of the algorithm is vulnerable to bit flips, so in its current form it should be used in conjunction with MACs such as Poly1305, which are provably secure. Although the basic Poly1305 variant uses AES, it allows the user to easily use other ciphers. In the current OpenSSL implementation, the algorithm takes advantage of the EVP suite so that message authentication is automatically handled by HMAC using SHA-384. The sequence of operations is independent of the key values, and there are no routines that run at unfixed times. As far as I know, the processing time does not depend on the key value or the input value. This presumably provides immunity to side-channel attacks.
In the future, the authors of the cipher plan to optimize the algorithm and code, vectorize it to use the AVX instruction set, and improve the efficiency of the tuning procedure. Unlike AES, the cipher does not require access to the AES-NI instruction set to achieve optimal speed. Therefore, it is expected to provide more stable performance when used in the context of virtual and container environments, as well as side-channel security advantages compared to AES. Thus, Nightingale represents one of the fast, secure, and adaptable alternatives to AES. The extensible design will allow it to meet the ever-changing security requirements of the quantum computing era, and its flexibility will allow it to remain predictably secure in an unpredictable future.
Source
Thus, the size of the symmetric key will be reduced by half, i.e. the efficiency of the AES-256 key will be reduced to 128 bits. Then the question automatically arises, will AES-256 remain secure? Let's figure it out.
Given current physical and economic computational limitations, it is commonly assumed that a 128-bit key provides sufficient security to resist brute-force attacks, regardless of current advances in computing technology. The arguments of those who support this assumption are largely based on the Landauer principle, which states that there is a minimum limit on the amount of energy required to erase a single bit of information. Since its formulation, the Landauer limit has been the subject of much study over the past half century. Recent results suggest that there is indeed no lower limit on the energy required for computation. This insight has motivated research into zero-power computing and implies that the physics-based arguments for the absolute security of 128-bit keys need to be reconsidered.
It appears that due to its nature, AES-256 is fundamentally not as secure as AES-128. In particular, there are no known attacks on AES-128 faster than exhaustive search. AES-256 is also known to have some vulnerabilities, although they are not currently exploitable in practical terms. It is therefore reasonable to assume that under a quantum attack, the maximum effective security of AES-256 would be 128 bits, and perhaps significantly less. As Bruce Schneier likes to say, “The attacks always get better and never get worse.”
Jordi Rose, co-founder of quantum computer company D-Wave Systems, has noted that the number of qubits per processor his company designs is roughly doubling every year. It may be too early to tell whether this claim is truly a quantum extension of Moore’s Law. However, if it is true, the implications of this type of exponential growth will lead to a massive increase in parallel quantum computing.
The confluence of these factors could mean that, for example, the key space for AES-256 could be cryptanalytically reduced and then processed by the quantum equivalent of an array of massively parallel processors. In this light, it might be possible to suggest that AES as an encryption standard is slowly approaching its logical conclusion. In any case, many researchers are working hard to find replacements.
What should be the key characteristics of an encryption algorithm to replace AES?
First of all, it should be fast, i.e. built on simple operations that require a relatively small number of processor cycles. No less important is the scalability of the algorithm, so that systems based on it can work effectively in an arbitrary range of word sizes and adapt to various architectures, both modern and future. Of course, the most important characteristic of the algorithm should be its cryptographic strength and resistance to known attacks. It would be nice to ensure its relative simplicity of understanding and implementation in order to transparently replace AES. And finally, given the zoo of modern devices, hardware independence is necessary to achieve high performance on various types of devices without any crutches such as a hardware instruction set of a specific architecture.
Given these goals, one prominent concept is to use at least three relatively simple and separate nonlinear processes that could interact in a way that obfuscates their individual contributions. Moreover, each component would have its own secret independent key.
The pioneering work of Daniel Bernstein of the University of Illinois on the Salsa20 family of stream ciphers embodies this minimalist approach to designing ciphers from scratch. The failure of cryptanalysis and Google’s subsequent adoption of his ChaCha20 variant in 2013 support Bernstein’s argument about the effectiveness of carefully executed simple instructions.
Let's consider one of such developments. The authors of the cipher are Harlan Brothers, Luke Lonergan and Robert Schmicker. They called their algorithm Nightingale.
The cipher components are limited to the following cryptographic primitives:
- XOR operation and circular shift;
- permutation congruential generator (an algorithm for generating pseudorandom numbers) that involves bit shifting, addition, and multiplication modulo 2n;
- S-block.
Bernstein recommends avoiding the use of S-boxes, however in this algorithm the S-box does not dynamically interact with the key, and so the structure is not susceptible to the attack he successfully implemented on AES.
So, let's move on to the description of the algorithm where:
- Ciphertext C i
- plaintext P i
- Key flow K i
- mask M
- S-block S
- output S-block S i
- support vector RV
- nonlinear bitwise rotation function R.
Then we have:

The figures below show flow charts of the encryption and decryption processes. This version of the algorithm uses 64-bit blocks and 896 bits of key material, although it can be implemented with 512 bits. Other configurations can reduce the overall key length to 384 bits.

Figure 1. Flowchart of the Nightingale encryption process

Figure 2. Flow chart of the Nightingale decryption process
Since the algorithm processes 64-bit plaintext blocks, messages whose size is not a multiple of 64 bits need to be padded. We will show below how the algorithm was integrated into the OpenSSL library. OpenSSL EVP adds padding by default using the IETF Cryptographic Message Syntax Specification.
There are two main sources of nonlinearity in the algorithm: a high-quality pseudorandom number generator (PRNG) and a session-randomized S-box. Let's look at them in more detail.
PRNG (128 bit)
The keystream for the algorithm is provided by a permutation congruential generator (PCG). The PCG family of generators, developed by Melissa O'Neil, performs two functions: a state transition function, which is a linear congruential generator (LCG), and an output function, which applies a nonlinear permutation to the internal state.
The output function is built on the idea of tuple permutation functions. These permutations exploit the existing randomness in the most significant bits of the current state to improve statistical qualities while maintaining uniformity. This approach effectively eliminates the well-known shortcomings of LCGs and outputs a stream that is statistically difficult to predict.
For this version of Nightingale, an LCG was chosen that uses a 128-bit seed, 128 bits of internal state, and produces a 64-bit output with a period of 2 128. The seed is derived from the hash of the shared session key. To avoid predictability, its output function includes an XOR operation on the internal state. It outperforms some well-established cryptographically strong pseudorandom number generators (CSPRNGs) in speed and statistical performance. Unlike some of them, this generator passes the TestU01 BigCrush statistical test suite, but it should be emphasized that it is not currently considered a CSPRNG, primarily because it has not yet undergone rigorous peer review. However, at a minimum, its output is difficult to predict, and since it is used in a non-standard way, conventional attacks on PRNGs are not possible.
For example, a standard assumption is that an attacker can obtain the plaintext corresponding to at least part of the ciphertext. In the context of a typical cipher, it would then be trivial to obtain part of the keystream and thereby attempt to learn the internal state of the generator and, if possible, the initialization vector. This could successfully compromise the security of the system. Looking at the encryption scheme, we see that in the case of Nightingale, the plaintext is substantially separated from the keystream, so an attacker would not be able to exploit partial or even complete knowledge of the plaintext in this way.
S-block (512 bits)
The S-box in Nightingale is a fast and simple table that maps each input byte to an output byte using a random permutation of numbers from 0 to 255. It is generated uniquely during each session. Unlike the static S-box in AES, which interacts with the key via a state matrix and is carefully designed to minimize correlation between input and output bits, this S-box is ephemeral, indeterminate, unmixed with the encryption key, and is not the only source of nonlinearity in the cipher.
At the start of each session, records are obtained using four 128-bit seeds derived from the shared key exchange hash as input to the LCG. The four seeds are used to create a matrix G of four independent streams g(1), g(2), g(3), and g(4), each consisting of forty 64-bit words:

Transpose:

and glue them together into a line:
{g(1) 1, g(2) 1, g(3) 1, g(4) 1, g (1) 2, g(2) 2, g(3) 2, g(4) 2, . . . , g(1) 40, g(2) 40, g(3) 40, g(4) 40 }.
Finally, this sequence of 64-bit words is divided into 256 40-bit values, which are used to shuffle the numbers from 0 to 255. The resulting randomized sequence of values constitutes the elements of the S-box. For an index i = {0, 1, 2, . . ., 255} and a unique random 8-bit value s i, we can define the S-box function as:
The decryption process uses the inverse S-box. To get it, we simply swap the S-box assignments in the equation above so that for the inverse function S −1 :
In this algorithm, there are approximately 10,507 possible permutations of the S-box. At a minimum, a 1684-bit key would be needed to theoretically cover this huge space of permutations. We have an arbitrarily assigned 512-bit key to cover a completely unrealistic, albeit small, part of this space (≈ 10,154 permutations). The S-box can be lengthened or shortened depending on the specific application of the algorithm.
Due to the nonlinear interaction of several components, for any session, having an identical input and output byte is not a vulnerable feature. Similarly, over the entire permutation space, the probability of finding a pair of consecutive values in a given session (≈ 1 − e -1 ) far outweighs the small probability of duplicate mixing values (≈ 10 -12 ). It is impossible to know exactly what part of the permutation space we collect in any session. Even if we assume that the subset of possible permutations is a representative sample of the entire space, the S-box is relatively isolated from both the input and output of the cipher. It is also XORed with different keys in each block. Therefore, at best, extracting useful information about the contents of a uniquely generated S-box seems impractical.
Mask M (128 bit)
Based on the first scheme and the first formula, the mask is intended to provide preliminary protection against a chosen-plaintext attack. The 128 bits obtained from the hash of the shared session key are used to fill the LCG, which then generates a 64-bit mask. This ensures that the attacker's chosen plaintext is not passed to the S-box. For example, the string "aaaaaaaa" can be transformed into any combination of 8 different characters. In the case of repeated plaintext blocks, it is guaranteed that this plaintext is transformed differently for each block b. Therefore, starting from b 2, the output S i-1 of the S-box b i-1 is XORed with the mask and the plaintext.
Support Vector RV (128 bits)
Designed to add a layer of nonlinearity to the ciphertext by obfuscating the relationship between the keystream and the S-box output. 128 bits, also derived from the hash of the shared session key, are used to pad the LCG, which then generates the 64-bit RV. To complicate the interaction of the RV with the keystream and the S-box output, the RV undergoes a nonlinear bitwise right rotation. The rotation distance is determined by the most significant six bits of the block's keystream. This rotation is cumulative, so that for the rotation distance rot and the right rotation function R rot, we obtain:
Assuming no rotational symmetry in RV, this means that when combined with the S-box output, the keystream for each block is mixed with one of 64 different random values. This further complicates any attempt to break the internal state of the LCG function. Finally, for the first block of each session, RV replaces the S-box output of the non-existent previous block.
In its current form, Nightingale has been integrated into the OpenSSL library (version 1.1.1-dev) by inserting the implementation as a crypto module alongside others available in the library (e.g. RC4, AES, DES, etc.) to make deployment of this new cipher as simple as possible. Additionally, Nightingale is integrated into OpenSSL's high-level EVP interface, allowing the user to easily change the cipher used in their program by simply selecting a different crypto module.
To ensure consistency, tests were conducted using the built-in OpenSSL speed test. The results are presented in the table below. All tests were performed on a test bench using an Intel Core i7 6800k processor with 16 GB of memory running Ubuntu 16.10. The results are reported in thousands of bytes processed per second.

Timing tests show that on buffers larger than 64 bytes, Nightingale is faster than all versions of AES that use hardware acceleration without parallelization. For smaller buffers, hardware-accelerated AES appears to have an advantage due to lower startup costs. When using caching and other memory optimizations, Nightingale outperforms hardware-accelerated AES by about 15% for large buffers. Compared to software implementations of AES-128 and AES-256, Nightingale is 5.8 and 8.6 times faster, respectively, for large buffers.
Since Nightingale is a relatively new algorithm, it has yet to undergo more rigorous peer review. However, it has already passed the TestU01 statistical test suite, showing that its output is indistinguishable from random. The BigCrush test results for Nightingale can be found here. These results imply that the ciphertext is indistinguishable from a chosen-plaintext attack, which in turn is equivalent to semantic security. However, the described version of the algorithm is vulnerable to bit flips, so in its current form it should be used in conjunction with MACs such as Poly1305, which are provably secure. Although the basic Poly1305 variant uses AES, it allows the user to easily use other ciphers. In the current OpenSSL implementation, the algorithm takes advantage of the EVP suite so that message authentication is automatically handled by HMAC using SHA-384. The sequence of operations is independent of the key values, and there are no routines that run at unfixed times. As far as I know, the processing time does not depend on the key value or the input value. This presumably provides immunity to side-channel attacks.
In the future, the authors of the cipher plan to optimize the algorithm and code, vectorize it to use the AVX instruction set, and improve the efficiency of the tuning procedure. Unlike AES, the cipher does not require access to the AES-NI instruction set to achieve optimal speed. Therefore, it is expected to provide more stable performance when used in the context of virtual and container environments, as well as side-channel security advantages compared to AES. Thus, Nightingale represents one of the fast, secure, and adaptable alternatives to AES. The extensible design will allow it to meet the ever-changing security requirements of the quantum computing era, and its flexibility will allow it to remain predictably secure in an unpredictable future.
Source