Man
Professional
- Messages
- 3,046
- Reaction score
- 571
- Points
- 113
Today we will develop our own program for encrypting files. We will write in C++ and use cryptographic algorithms from the Botan library. Let's see what pitfalls there are when developing such programs and how not to inadvertently help the attacker.
How can we counter all this? Let's look at the issue from the perspective of an encryption software developer. Here's what his product should do:
It sounds simple, but there are some subtleties that we will discuss next.
Salt is used during the hashing process to create unique hashes (including KDFs), even if the password or other hashing input is the same. This makes brute force and rainbow attacks more difficult because it adds randomness to the hash. How long should you choose for your salt? Skipping some of the math (which no one reads anyway), a salt length of 256 bits or more is considered secure today.
IVs are used in symmetric encryption (such as CBC, CFB, or GCM mode) to ensure that identical messages encrypted with the same key produce different results. This prevents the encrypted data from being predictable, improving its security.
The salt and IV should be random, but the randomness of the salt is more critical than the randomness of the IV. Strictly speaking, the IV should simply be unique for the encrypted data when using GCM mode (for AEAD mode, it is more accurately called a nonce). The Botan library, which is used in this article, allows generating random numbers using the randomize_with_ts_input and randomize functions.
In addition, KDF functions can be configured and strengthened, which will completely put an end to the idea of cracking a password by brute-force. Naturally, we use the option with hardcore KDF settings for real paranoids!
In the article, I will use Argon2id as a KDF function. It is a finalist of the PHC competition. This function has several configurable parameters: the amount of memory used to generate the key, the number of iterations and the number of execution threads. Strictly speaking, the choice of parameters for different situations is regulated by RFC 9106, but for the article I use the following parameters: memory size - 2 GB, one iteration, four threads. Of course, you can change them as you like, but it is better to increase, not decrease.
So, the code for creating a key from a password will be as follows.
I've provided comments in the code so that each line is clear.
The Botan library has a derive_key function that allows you to immediately generate a key, but in this case we will lose the fine-tuning and strengthening of the KDF. Note: this is creating a key for encryption, not for decryption. The thing is that the salt is created on the fly at the beginning of the function, and during the decryption operation, the salt will need to be passed to the derive_key_from_pass function, slightly changing it for this.
So, the key has been generated, the next step is encryption.
The encryptor has been created. Now everything is ready for the encryption process.
After the function has run, we get an encrypted file at the output, which has as a header an IV (for decryption) and a salt (for KDF), i.e. random data that cannot be distinguished from the encrypted file itself. In essence, our encrypted file has zero metadata related to encryption.
We will be able to reuse the main part of the already written code, making small changes.
First of all, we will correct the derive_key_from_pass function - for decryption, we do not need to generate a salt, instead, it will need to be passed to this function as a parameter. Of course, both the salt and the IV will need to be extracted from the encrypted file, simply by reading them at the very beginning.
After that, we proceed to the actual decryption:
Pay attention to the exception Botan::Invalid_Authentication_Tag — it signals that your data was decrypted incorrectly (for any reason, be it an incorrect password or a corrupted file). If you want to build into your application the ability to encrypt with several algorithms to choose from, then using this exception it is easy to organize a brute force search of possible decryption options on different algorithms: if all algorithms are tried, but the exception still occurs, then the password is incorrect.
Asymmetric encryption: Quantum computers will almost completely destroy the security of RSA, elliptic curve cryptography systems, and other schemes based on discrete logarithms and factorization if powerful quantum computers are implemented. But the same Botan library has already implemented post-quantum asymmetry.
Symmetric encryption is better, but quantum computers can still weaken the security of symmetric algorithms. The main culprit is Grover's algorithm, which allows a quantum computer to speed up a brute-force attack.
Grover's algorithm can search through unordered data, such as password or key space, with quadratic speedup. This means that if a classical computer requires 2^n operations to try all possible keys, then on a quantum computer with Grover's algorithm this number will be reduced to 2^n/2.
In this case, symmetric algorithms will be weakened by quantum computers by half, but this will not lead to a complete compromise. To protect against quantum attacks, you can simply double the key length, which will significantly increase security. For example, you can use AES-256 instead of AES-128 or take Threefish-512 to ensure 256-bit security.
What helps an attacker break encryption
A number of factors can make the task of an attacker trying to crack an encrypted file much easier:- Password length and complexity. For example, brute-forcing a 4-character password will take much less time than a 12-character one, and if you expand the alphabet, for example by using upper and lower case letters, numbers and special characters, this will significantly increase the space of options for brute-forcing, which will significantly slow down the hack. Of course, passwords should not be any dictionary word or derivative of it, otherwise such a password will be cracked by a dictionary attack. There are also so-called rainbow tables - pre-computed tables of password hashes. They allow you to find the original password by its hash. If salt or other protective measures were not used to encrypt or hash the password, such attacks can be quite effective.
- Cryptography implementation errors. As has been said many times, do not invent your own encryption algorithms and stick to ready-made and popular open source libraries. Choose encryption, hashing and KDF algorithms that have been tested by time and cryptanalysts.
- Metadata. File encryption programs can leave a lot of metadata: for example, they can create a header in the encrypted file that specifies the encryption and hashing algorithms, the program version, and a lot of other data. Almost all of this data can help an attacker hack. If all the algorithms used are already known in advance, then you just need to target them. Even simply changing the file extension to one unique to your program can sometimes help.
- Known plaintext patterns. If an attacker knows part of the plaintext, they can use a known-text attack. For example, knowing that a file contains certain standard information can help in an attack (e.g. when there is a standard file header).
- Default settings. If a program has any default values, then in most cases these will be used. If it is an encryption program, then users will choose the algorithms and their parameters that were initially set.
How can we counter all this? Let's look at the issue from the perspective of an encryption software developer. Here's what his product should do:
- generate an encryption key from the user password;
- encrypt or decrypt data;
- take into account the listed attack vectors.
It sounds simple, but there are some subtleties that we will discuss next.
Random Numbers and the Importance of PRNGs
Before you begin working with encryption, there are a few basic cryptographic concepts you will encounter and need to understand: salt and initialization vector (IV). These are two components used in cryptographic processes to improve security. Although they serve similar purposes (preventing predictability), their roles and uses are different.Salt is used during the hashing process to create unique hashes (including KDFs), even if the password or other hashing input is the same. This makes brute force and rainbow attacks more difficult because it adds randomness to the hash. How long should you choose for your salt? Skipping some of the math (which no one reads anyway), a salt length of 256 bits or more is considered secure today.
IVs are used in symmetric encryption (such as CBC, CFB, or GCM mode) to ensure that identical messages encrypted with the same key produce different results. This prevents the encrypted data from being predictable, improving its security.
The salt and IV should be random, but the randomness of the salt is more critical than the randomness of the IV. Strictly speaking, the IV should simply be unique for the encrypted data when using GCM mode (for AEAD mode, it is more accurately called a nonce). The Botan library, which is used in this article, allows generating random numbers using the randomize_with_ts_input and randomize functions.
Encrypting files
How is an encryption key created?
After the user enters the password for data encryption, it must be converted into an encryption key. For this purpose, special mathematical functions for generating a key from a password were invented - Key Derivation Function (KDF). They are also called "slow hashes" because the key from the password is created slowly: this procedure requires significant CPU and RAM resources. This is done intentionally to resist brute-force attacks (or large dictionary attacks): an attacker needs to try hundreds of millions of combinations to get the correct password, and when strong KDF functions are used, the selection speed will be critically low and it will take too much time to crack the password.In addition, KDF functions can be configured and strengthened, which will completely put an end to the idea of cracking a password by brute-force. Naturally, we use the option with hardcore KDF settings for real paranoids!
In the article, I will use Argon2id as a KDF function. It is a finalist of the PHC competition. This function has several configurable parameters: the amount of memory used to generate the key, the number of iterations and the number of execution threads. Strictly speaking, the choice of parameters for different situations is regulated by RFC 9106, but for the article I use the following parameters: memory size - 2 GB, one iteration, four threads. Of course, you can change them as you like, but it is better to increase, not decrease.
So, the code for creating a key from a password will be as follows.
Code:
// Here is the structure for defining key parameters
struct KeyParam {
Botan::secure_vector<uint8_t> key;
Botan::secure_vector<uint8_t> salt;
Botan::secure_vector<uint8_t> iv;
}
...
size_t derive_key_from_pass(
// Pass the user password
const std::string& password,
// Structure of key parameters
KeyParam& keydata
)
{
try {
// Use PRNG built into Botan
Botan::AutoSeeded_RNG rng;
// Create a random salt 256 bits long
rng.randomize_with_ts_input(&keydata.salt[0], 32);
// Using Argon2id algorithm for KDF
Botan:asswordHashFamily::create_or_throw("Argon2id")->
// Set parameters - memory size in kilobytes, number of iterations and threads
from_params(2097120, 1, 4)->
// Passed salt, password and received key in keydata.key
hash(keydata.key, password, keydata.salt);
return ERROR_OK;
}
catch (const Botan::Exception& e) {
std::cerr << "Botan exception: " << e.what() << std::endl;
return ERROR_DERIVE_KEY;
}
catch (const std::runtime_error& e) {
std::cerr << "Runtime error: " << e.what() << std::endl;
return ERROR_DERIVE_KEY;
}
catch (...) {
std::cerr << "Unknown error occurred during key derivation." << std::endl;
return ERROR_DERIVE_KEY;
}
}
I've provided comments in the code so that each line is clear.
The Botan library has a derive_key function that allows you to immediately generate a key, but in this case we will lose the fine-tuning and strengthening of the KDF. Note: this is creating a key for encryption, not for decryption. The thing is that the salt is created on the fly at the beginning of the function, and during the decryption operation, the salt will need to be passed to the derive_key_from_pass function, slightly changing it for this.
So, the key has been generated, the next step is encryption.
info
You might think that the number of threads (the parallelism parameter when calling from_params) affects only the performance, but not the result. But this is not true: in the Argon2id algorithm, this parameter affects the structure of the calculations. The hashing process is divided into several threads that are executed in parallel. Because of this, the order of processing blocks changes depending on the number of threads, and different parallelism values will lead to different intermediate states during hashing and ultimately to a different hash.
Encrypting a file
Before we start encrypting, let's understand such a concept as encryption mode. A cipher mode is a way of using an encryption algorithm to process data in order to ensure confidentiality, integrity, and authenticity. Block ciphers encrypt data in blocks of a certain size (for example, 128 bits for AES). If you apply encryption at the level of these blocks without a mode, then the same blocks of plaintext will always be transformed into the same blocks of ciphertext. This makes the ciphertext predictable and vulnerable to attacks. Modern encryption modes also help control the integrity of encrypted data (Authenticated Encryption with Associated Data, AEAD). Of course, we will use one of these modes, called GCM (Galois/Counter Mode).info
The GCM mode has a weak point - too short nonce. This limits the size of encrypted data to 64 GB on one nonce. A fixed version is currently expected - GEM mode.
First, let's create and initialize the encryptor. For example, we will use the AES algorithm in GCM mode, but, of course, you can add several encryption algorithms to choose from.
Code:
std::unique_ptr<Botan::AEAD_Mode> create_cipher(
const Botan::SymmetricKey& key,
const Botan::InitializationVector& iv,
// Let's make a universal cipher for encryption and decryption
const std::string& mode
)
{
try {
// Encrypt or decrypt?
auto operation = (mode == "Encrypt") ?
Botan::Cipher_Dir::Encryption :
(mode == "Decrypt") ? Botan::Cipher_Dir:ecryption :
throw std::invalid_argument("Invalid mode: " + mode);
std::unique_ptr<Botan::AEAD_Mode> cipher_mode;
Botan::SymmetricKey key(keyparams.key.data(), keyparams.key.size());
// Set the operation, cipher and mode
cipher_mode = Botan::AEAD_Mode::create_or_throw("AES-GCM", operation);
// Passed the key and IV
cipher_mode->set_key(key);
cipher_mode->start(iv);
}
catch (...) {
throw std::runtime_error("Failed to create cipher mode");
}
return cipher_mode;
}
The encryptor has been created. Now everything is ready for the encryption process.
Code:
size_t encrypt_file(
const std::string& inputFile,
const std::string& outputFile,
const KeyParam& keyparams
)
{
try {
// in — file to be encrypted (read into output_data)
// out — output encrypted file
std::unique_ptr<Botan::AEAD_Mode> cryptor;
Botan::AutoSeeded_RNG rng;
// For GCM mode, IV must be 96 bits
Botan::InitializationVector iv(rng, 12);
// IV and salt required for decryption are written to the output file
out.write(reinterpret_cast<const char*>(iv.data()), iv.size());
out.write(reinterpret_cast<const char*>(keyparams.salt.data()), keyparams.salt.size());
cryptor = create_cipher(key, iv, "Encrypt");
// Encrypt the entire output_data buffer at once, into which the encrypted file was read
cryptor->finish(output_data, 0);
out.write(reinterpret_cast<const char*>(output_data.data()), output_data.size());
return ERROR_OK;
catch (...)
{
in.close();
out.close();
return ERROR_ENCRYPT;
}
}
After the function has run, we get an encrypted file at the output, which has as a header an IV (for decryption) and a salt (for KDF), i.e. random data that cannot be distinguished from the encrypted file itself. In essence, our encrypted file has zero metadata related to encryption.
info
Remember that all metadata can be used against you by an attacker - he will know in advance what parameters to use to crack your encrypted file. As for the salt and IV, according to the Kerckhoffs principle, only the key should be kept secret.
How to check the correctness of decryption and data integrity
We have a file encrypted by us, now we need to somehow decrypt it. The decryption program must, among other things, understand that the file is decrypted correctly - this is implemented by using the AEAD encryption mode.We will be able to reuse the main part of the already written code, making small changes.
First of all, we will correct the derive_key_from_pass function - for decryption, we do not need to generate a salt, instead, it will need to be passed to this function as a parameter. Of course, both the salt and the IV will need to be extracted from the encrypted file, simply by reading them at the very beginning.
After that, we proceed to the actual decryption:
Code:
size_t decrypt_file(
const std::string& inputFile,
const std::string& outputFile,
const Botan::InitializationVector& iv
)
{
try {
// in is the file to be decrypted
// out is the output decrypted file
// we just read the IV at the beginning of the file
// output_data is the buffer containing the encrypted file
std::unique_ptr<Botan::AEAD_Mode> cryptor;
cryptor = create_cipher(key, iv, "Decrypt");
// Decrypt the entire output_data buffer that contains the encrypted file
cryptor->finish(output_data, 0);
// Write the decrypted file to disk
out.write(reinterpret_cast<const char*>(output_data.data()), output_data.size());
return ERROR_OK;
// Check for integrity of decrypted data
catch (const Botan::Invalid_Authentication_Tag&)
{
in.close();
out.close();
return ERROR_DECRYPT;
}
}
Pay attention to the exception Botan::Invalid_Authentication_Tag — it signals that your data was decrypted incorrectly (for any reason, be it an incorrect password or a corrupted file). If you want to build into your application the ability to encrypt with several algorithms to choose from, then using this exception it is easy to organize a brute force search of possible decryption options on different algorithms: if all algorithms are tried, but the exception still occurs, then the password is incorrect.
info
In the article I didn't say anything about securely deleting keys from memory and other such things. In a situation where an attacker is able to access your computer at a level sufficient to dump RAM, any encryption most likely loses its meaning.
What about quantum computing?
Quantum computers promise to change the cryptography landscape by greatly weakening encryption. The impact of quantum computing on the security of symmetric and asymmetric encryption varies depending on the algorithms used.Asymmetric encryption: Quantum computers will almost completely destroy the security of RSA, elliptic curve cryptography systems, and other schemes based on discrete logarithms and factorization if powerful quantum computers are implemented. But the same Botan library has already implemented post-quantum asymmetry.
Symmetric encryption is better, but quantum computers can still weaken the security of symmetric algorithms. The main culprit is Grover's algorithm, which allows a quantum computer to speed up a brute-force attack.
Grover's algorithm can search through unordered data, such as password or key space, with quadratic speedup. This means that if a classical computer requires 2^n operations to try all possible keys, then on a quantum computer with Grover's algorithm this number will be reduced to 2^n/2.
In this case, symmetric algorithms will be weakened by quantum computers by half, but this will not lead to a complete compromise. To protect against quantum attacks, you can simply double the key length, which will significantly increase security. For example, you can use AES-256 instead of AES-128 or take Threefish-512 to ensure 256-bit security.