An introduction to automatic dump decode process

Friend · Nov 16, 2020

Content:
1. Brief structure of the WAV file;
2. Processing of discrete information;
3. Search for dumps;
4. Decoding. Theory;

WAV file structure (Windows PCM).
The sound file can be clearly divided into two parts. The first part stores information about the file header, the second part stores the audio data directly.
The header contains information about the file size, the number of channels (mono, stereo, etc.), the sampling rate, etc. Audio data is stored as samples. To extract samples there is a WAV-TXT program, which is included in the well-known excel package for decode.

The first screenshot shows an example of an audio file with a real dump opened in the Cool Edit pro program

The second screenshot shows an example of an audio file zoomed in. In this screenshot, you can see the points. Each point is a discrete from which this graph is built.

Code:

55, 58, 63, 65, 69, 73, 76, 79, 82, 85, 88, 90, 92, 93, 93, 93, 91, 86, 80
72, 63, 52, 38, 27, 15, 6, 0, 1, 4, 14, 26, 44, 62, 81,101,120,136,151
165,176,187,196,205,213,220,226,231,234,235,234,233,229,225,221,216,211,206
202,196,193,190,186,182,179,177,174,172,170,167,165,165,164,166,168,172,177
185,195,205,216,227,238,246,252,255,254,249,240,227,211,192,172,151,131,110
94, 79, 66, 56, 50, 44, 42, 39, 37, 37, 36, 35, 35, 35, 35, 35, 36, 36, 38
39, 41, 45, 47, 53, 57, 63, 68, 74, 78, 82, 86, 88, 89, 91, 92, 93, 93, 92
90, 86, 80, 72, 62, 50, 37, 27, 18, 13, 12, 16, 25, 40, 58, 78, 99,121,139
158,173,187,198,209,218,228,235,242,246,249,250,249,249,248,245,244,246,247

Here is an example of a sample taken from a real audio file.

Processing of discrete information.

Before the decoding process, it is necessary to prepare the data for more convenient processing. Data preparation will consist of several stages:
1. Find out the size of the audio data from the header information of the audio file and, if the size is large, it is necessary to perform a phased download into the PC memory, i.e. split audio data into several parts. You can calculate the required number of parts using the formula:

ChankSize - total size of audio data
buf is the size of the audio data that the PC can load for further processing
part - the number of parts to split

part = ChankSize / buf

After that, you need to discard the fractional part and we will get the required number of parts.

2. Determine the minimum and maximum background noise threshold for further discarding it during decoding and automatic search for dumps.

[PSEUDOCODE]
MIN = 255
MAX = 0

CYCLE i from 0 to buf
IF MIN> data THEN MIN = data
IF MAX <data THEN MAX = data
[/ PSEUDOCODE]

3. Perform staged loading into memory

[PSEUDOCODE]
Variables:
ChankSize - total size of audio data
buf is the size of the audio data that the PC can load for further processing
PART - the number of split parts

IF buf> ChankSize THEN (load all data)
OTHERWISE (cyclically load data PART times)
[/ PSEUDOCODE]

Search for dumps.
When searching for dumps, several important parameters must be taken into account: minimum and maximum background noise threshold, minimum length of found data (dump), minimum length of empty data (background noise), background noise counter, counter of found data.

[PSEUDOCODE]
Variables:
ChankSize - total size of audio data
buf is the size of the audio data that the PC can load for further processing
PART - the number of split parts
MIN, MAX - background noise thresholds
data - data
count - background noise counter
countdump - counter of data found
lengthdump - minimum dump length

CYCLE i 0 to PART
IF (data> min) and (data <max) THEN count = count + 1 OTHERWISE count = 0
IF count <300 THEN
countdump = countdump + 1
IF countdump> lengthdump THEN
(copy the found data into a new variable for further decoding)
[/ PSEUDOCODE]

Decoding. Theory.

The decoding process is perfectly described in the Excel documentation (I can not give download links for anyone) and it makes no sense to describe it in detail. In short, a decode consists of three steps:
- timing definition;
- bit definition;
- decoding of bit (binary) information.

Timing is the number of samples in one half cycle.

For a more detailed understanding of timing, the picture shows a half cycle, a full period, zeros and ones are shown. Those. from this it follows that the half-period of zero is twice the half-period of unity. If we add two half-periods of unity, then we get an approximate half-period of zero.
The process of determining bits from timings is very simple. We need to calculate zero - this is (as we found out above) a number, approximately twice the half-period of one. If the next number after zero is two times less, then the next number after the smaller one should be equal to approximately the same value (fucking confusion, he himself would not understand nichrome if he read it).
You can calculate the bits differently. It is necessary to calculate the length of zero and "move" forward along the graph with approximately the same interval. This length is shown in the graph with blue dots. If there was a half-period between these points, then this is zero, if there was a full period, then one.
Also, at the beginning and at the end of the dump, there are synchronizing zeros, which have nothing to do with the dump, but serve only to calculate the length of a half-period or a full period (shown by blue dots).
If you have any questions regarding the timing and bit calculation, I will tell you in more detail.
After receiving the bits, it remains to do the decoding according to the schedule.

This is how my automated decode software works.
All in all. If the audience is interested, they are ready to delve into each of the stages of automatic decoding in more detail with real examples of sources. I will answer all your questions in this topic.

An introduction to automatic dump decode process

Friend

Professional

Similar threads