First proof of unbreakable steganography

Man

Professional
Messages
3,077
Reaction score
614
Points
113
In information theory, the concept of relative entropy is used to compare probability distributions. It is like measuring an abstract distance: if the relative entropy between two distributions is zero, then it becomes impossible to apply statistical analysis to reveal the secret.

In other words, no statistical observation will be able to detect it. Neither a person nor a machine will be able to detect the presence of a hidden message within a text, image, audio recording of speech or other communication channel. The transmission will be perfectly hidden.

Last year, scientists proved that creating such an algorithm is theoretically possible. This is mathematically ideal steganography.
Steganography is the practice of encoding secret information into harmless content in such a way that the attacker (third party) will not guess about the presence of a hidden message.

In this way, secret messages can be embedded in pictures, videos, even texts - and easily posted in the public domain, on YouTube, on TV or even published on Habr. And only the recipient who:
  1. knows about the presence of the message;
  2. knows the decryption key (the key generator is agreed upon in advance)

39snggvo8s5ejszbs0lamuaye98.png


Although the problem has been studied for a long time, recent advances in generative ML models have led security and machine learning researchers to become interested in developing scalable steganography methods.

The authors of the paper proved that a steganography procedure is perfectly secure within the information-theoretic model of steganography by Kachin (1998) if and only if it is induced by coupling.

The authors also showed that among the procedures, the procedure is maximally efficient if and only if it is induced by minimum entropy coupling.

These ideas made it possible to create the first steganography algorithms with a perfect security guarantee. In addition, they scale well.

The general form of the encoding and decoding algorithms is as follows.

Encryption:

oq5uoaezhiiqshnp1wvqodq0oyi.png


Decryption:

38dqmiiks6urqnsupi_c2qdz0ra.png


Both algorithms have a similar structure. The first one greedily encodes information about x into S using the induced posterior over X to maximize the coupling efficiency. Decoding simply reproduces this posterior.

To test this, the authors compared their algorithm with three modern baseline algorithms:
  • arithmetic coding;
  • Meteor;
  • Adaptive Dynamic Grouping (ADG).

The communication channels used were GPT-2, WaveRNN, and Image Transformer content, which represent three common communication types: text, speech, and images.

The experiments showed that the innovative minimum-entropy communication-based iMEC implementation achieves higher coding efficiency despite the stricter security constraints:

khqbjkzrq6hr_lolaplgbwjqmec.png

Coolback-Leibler divergences between the stegotext distribution and the latent text distribution for each method

e9eugajrbss5cauhvep2iku1hlm.png

Comparative coding efficiency of iMEC with baseline methods. Each method is evaluated with tuned hyperparameters

All experiments were performed on an AMD Ryzen Threadripper PRO 3955WX processor with 16 physical cores and 2x NVIDIA GeForce RTX 3090 GPUs. Apart from model queries, iMEC encoding and decoding only took up a single CPU core, while arithmetic coding, ADG, and Meteor used multiple CPUs, and Meteor:reorder also used the GPU during encoding and decoding.

r-sm_4k4ebcidgvz31vfoicskx0.png

Comparative stegotext encoding time for different algorithms. Horizontal lines show the inference time for GPT-2, WaveRNN, and Image Transformer.

The last graph shows that the encoding speed drops significantly as the iMEC block size increases.

All ciphertexts were randomly selected 80-bit bit strings. Encryption efficiency was determined by measuring the amount of text required to transmit these bit strings. The hyperparameters of each method were tuned to achieve the best performance for this task. Each algorithm was run 1000 times. When calculating the average, cases where Meteor got stuck in an infinite loop were ignored.

Thus, the existence of unbreakable steganography has now been proven, provided that the content is generated by a neural network.

Discoveries in the field of information theory are not always applicable to practical steganography, but this is exactly the case here. The authors are currently working on adapting the algorithms for use with popular messengers (WhatsApp, Signal).

The new algorithms are limited in terms of the size of the secret message: According to one of the authors of the scientific paper, at today's level of technology, their system can hide a 225 KB file in a voice message for about 30 seconds.

Finding the smallest entropy interaction is an NP-hard problem, meaning that the ideal solution requires lengthy computations. It is believed that there are no polynomial-time algorithms for NP-hard problems, but this has not been proven. However, some NP-hard problems can be polynomially approximated to some constant approximation coefficient. The authors managed to find just such an approximation procedure for practical application in steganography.

The article "Absolutely secure steganography using minimum entropy interaction" was published on October 22, 2022 (the latest version for April 11, 2023) on the preprint website arXiv.org (doi: 10.48550/arXiv.2210.14889) and presented at the ICLR 2023 conference.

Source
 
Top