Finding Secrets in Program Code (by Entropy)

Man

Professional
Messages
3,070
Reaction score
606
Points
113
lumzbme9rxvcuj-xks4jmaoy7m0.png


A new tool for searching for private information in open source code has recently become publicly available. It is Entropy, a command-line utility that scans a code base for high-entropy strings. Presumably, such strings may contain secret information: tokens, passwords, etc.

The approach is logical. Passwords and tokens are by definition high-entropy strings, since they are created using random or pseudo-random number generators. The characters in such a sequence are ideally unpredictable.

Information entropy​


In information theory, the Entropy of a random variable is the average level of "information", "surprise" or "uncertainty" inherent in the possible outcomes of this variable. That is, in essence, the unpredictability of its occurrence. In the absence of information losses, entropy is numerically equal to the amount of information per symbol of the transmitted message.

If a discrete random variable X takes values in the set 𝑋 and is distributed in [0,1], then the entropy is equal to

992rhume4pktlpaolofkoovsxzu.png


where Σ denotes the sum of possible values of the variable. The choice of base for the logarithm varies for different applications. Base 2 indicates the number of bits of entropy (Shannons), base e indicates "natural units" (Nats), and base 10 indicates dit or Hartley units.

In the case of passwords (or strings of code), it is common to speak of bits of entropy. For example, the entropy of a particular password can be calculated using the Password Entropy calculator.

g4_f2zoqk6m6egjgnbbldvwy6x4.png


Entropy Tool​


The mentioned Entropy utility performs the same calculation, but in the code base for all lines of code.

Regular program code has a template, predictable structure, which means low entropy, which is close to the entropy of natural languages. On the other hand, tokens and passwords are usually a sequence of random characters with high entropy. So you can scan the entire code base and automatically find tokens/passwords in it.

The preferred option is to install it using the Go installer:
Code:
go install github.com/EwenQuim/entropy@latest
entropy

# Additional options
entropy -h
entropy -top 20 -ext go,py,js
entropy -top 5 -ignore-ext min.js,pdf,png,jpg,jpeg,zip,mp4,gif my-folder my-file1 my-file2

Alternative installation option with Brew:
Code:
brew install ewenquim/repo/entropy
entropy

# Additional options
entropy -h
entropy -top 20 -ext go,py,js
entropy -top 5 -ignore-ext min.js,_test.go,pdf,png,jpg my-folder my-file1 my-file2

And via Docker:
Code:
docker run --rm -v $(pwd):/data ewenquim/entropy /data

# Additional options
docker run --rm -v $(pwd):/data ewenquim/entropy -h
docker run --rm -v $(pwd):/data ewenquim/entropy -top 20 -ext go,py,js /data
docker run --rm -v $(pwd):/data ewenquim/entropy -top 5 /data/my-folder /data/my-file

The parameter -vis used to mount the current directory. The directory /datais used by default for code files. If you do not specify the directory, the program will search for files in the container, not in the local file system.

The Docker container is located on Docker Hub.

In principle, the relative entropy of the text can be determined by the degree of its compression using standard archivers. Thus, as an alternative tool for assessing entropy, you can use, for example, the following command using Perl:
Code:
perl -lne 'next unless $_; $z = qx(echo "$_" | gzip | wc -c); printf "%5.2f    %s\n", $z/length($_), $_'

In general, the open source program is designed to scan for vulnerabilities in its own repositories. But attackers can use this technique to search for someone else's secrets in someone else's code base.

Credential leaks​


Leaks of secret information through open repositories are a very common problem. Many have heard about the recent incident when hackers stole the source code of the New York Times using a public Github token.

6458913ad02bc11d4a94ff9583e2382e.JPG


It is not known where the attacker got the Github token from, but it is quite possible that it accidentally got into the open source, as often happens.

There are many such cases. Attackers use special tools to monitor fresh Github repositories and fresh commits for secret AWS tokens and other confidential information. The above-mentioned Entropy program can be used by them as one of the tools for code analysis. It is important to keep in mind that they have such a tool. Therefore, any private tokens and passwords in the public domain will become available to third parties almost immediately after publication.

Other tools for searching for secrets (credentials, etc.) in source code and repositories:

Source
 
Top