Disassembling 2021

President · Sep 4, 2021

In this article, we will talk about disassemblers, what they are and consider the principles of their work.

What is a Disassembler?
A disassembler is a translator that converts machine code, object file or library modules into program text in assembly language.

Disassembling is the opposite of assembly, i.e. restoring the text of an assembler program from an executable program in machine codes.

Immediately, I will run a little ahead and say that when reassembling the recovered text, there is no guarantee that you will receive the same code, which means there is a chance that the program will refuse to work.

Any attempt to modify the disassembled text can permanently ruin the program.

The point is that the assembler replaces all labels with constants. When making changes to the program, it is necessary to correct all references to tags.

Disassembler types
According to the mode of operation, disassemblers are divided into automatic and interactive.

A good example of an automatic disassembler is Sourcer.
Such disassemblers generate a finished listing, which can then be edited in a text editor.

An example of an interactive disassembler is IDA.
It allows you to change the rules of disassembly and is a very convenient tool for researching programs.

The main difficulty in the work of the disassembler is to distinguish data from machine code, therefore, in the first passes, information about the boundaries of procedures and functions is automatically or interactively collected, and in the last pass, the final listing is formed. Interactivity allows you to improve this process, since by viewing the dump of the disassembled memory area, the user can immediately select string constants, give meaningful names to known entry points, comment on the program fragments parsed by him.

According to the number of views of the object code, disassemblers are divided into one-, two- and multi-pass.

Single-pass, also called unlabelled disassemblers, are used primarily in debuggers and program monitors, and are designed when you need to quickly retrieve pseudo-assembler text.

The second name reflects their essential feature - in the pseudo-assembler listing there are no labels in the jump instructions and subroutine calls (there are absolute hexadecimal addresses). Therefore, after assembling such a recovered text, we get, in the general case, an absolute program. In one pass, with a limited amount of RAM, it is impossible to collect all the information about the tags, since the "back links" are lost.

Labels generated by disassemblers are usually Lxxxx, where the first letter of the English word is Label and xxxx is the address of a branch or subroutine.

Data labels have the same structure, but starting with D. This type of labels is most convenient for pseudo listing parsing. A programmer using a text editor can replace these labels with more informative ones.

The most common are two-pass disassemblers, which allow you to get labels in the disassembly listing, but do not completely solve the problem of separating commands and data. After them, the programmer "manually" corrects doubtful places.

Purpose of the disassembler
Most often, a disassembler is used to analyze a program (or part of it), the source code of which is unknown - for the purpose of modifying, copying or breaking. Less often - to search for errors / bugs in programs and compilers, as well as to analyze and optimize the machine code generated by the compiler.

When working with executable code or bytecode created in some high-level languages (for example, java), it is possible to restore not only the text in the assembly language, but even the structure of the program's classes, and if the debug information was not disabled during the compilation of the executable file, then the source code of the program. To exclude such possibilities, obfuscation is used.

Examples of work
After you have set the required options and selected the Go command, SOURCER loads the program into memory and determines the segment sizes. During the first pass, most of the subroutine and data area references are determined. SOURCER further assumes that the code and data areas are code until proven otherwise. At the beginning of each subsequent pass, SOURCER parses code and data references to more accurately identify areas of code and data. The final pass defines the required assembler directives, the format of each line, and comments.

An internal simulator monitors changes in the contents of all registers and maintains a separate stack for the program. The simulator also ensures that the correct segment is used when multiple data segments are used. It is the simulator's job to keep track of comments, I / O port calls, and resolve index calls and jumps. The simulator repeats the actions of the program. Most instructions that modify the contents of memory are not simulated, although instructions that read data from memory are supported. Dedicated support for CS register provides full simulation of ROM and RAM operation.

The package includes several utilities, including the LST2ASM utility , which allows you to convert listings to assembler text, and the PATCHER utility, which allows you to make changes to binaries.

BIOS Pre-Processor allows (in conjunction with SOURCER) to get the annotated source code of the basic input / output system (BIOS) installed on the computer. Why is this needed? To study the work and organization of the BIOS, to make changes and additions to the BIOS, to fix errors, as well as in a number of other cases. It can take anywhere from 10 minutes to 2 hours to create a BIOS listing, but the result is worth it.

BIOSPre-Processor works as follows. First, the interrupt vector table is analyzed and the entry points of the interrupt handlers are found. Then the key data areas and their size are determined. Then the BIOS size is determined and all the necessary information is entered into the BIOS.DEF file, which is processed by SOURCER.

You should be aware that often the text produced by the disassembler either does not lend itself to reassembly at all, or the resulting program does not behave at all the way you want, but we already talked about this at the beginning.

Protection of programs from disassembly
Various methods are used to protect against disassemblers, most of which are based on the "von Neumann principle", which is that programs and data look and store the same, as a result of which the program can modify itself. The use of such methods is most often enough to protect against disassemblers.

Below are some techniques that should be used to counter disassemblers.

1. Encryption of the critical program code and decryption of it by the protection system itself before transferring control to it. Thus, the decryption of the program does not occur immediately, but in parts and the protection from the disassembler is distributed over time. In this case, never perform decryption with one subroutine, since it will be easy to figure out and disable. You should also overwrite those parts of the program that are no longer needed. Encrypting the executable code of the program in order to protect it from the disassembler is the simplest means both in the sense of its implementation and in the sense of removing it. Encryption can only be used as part of the protection against the disassembler and therefore does not have to be complicated.

2. Hiding the control transfer commands results in the disassembler being unable to build the control transfer graph.

2.1. Indirect control transfer.

2.2. Modification of the jump address in the program code (table 1).

2.3. Using non-standard methods of transferring control (jmp via ret, ret and call via jmp) (table 2).

3. Overlapping code. Consider the following example:

The first instruction puts an "insignificant" value in AX. The second makes a transition to the value of the operand of the MOV AX instruction. '02EB' translates to 'jmp $ + 2'. This branch jumps over the first JMP and continues further through the code. Here's another example:

This code is more helpful. Let's simulate a trace showing a HEX dump of each step to clarify the situation.

Code:

B8 05 FE EB FC 80 C4 3B mov ax, 0FE05h; ax = FE05h
^^ ^^ ^^
B8 05 FE EB FC 80 C4 3B jmp $ -2; jmp into '05 FE '
^^ ^^
B8 05 FE EB FC 80 C4 3B add ax, 0EBFEh; 05 is 'add ax'
^^ ^^ ^^
B8 05 FE EB FC 80 C4 3B cld; a dummy instruction
^^
B8 05 FE EB FC 80 C4 3B add ah, 3Bh; ax = 2503h
^^ ^^ ^^

Instruction ADD AH, 03Bh here means simply entering 2503h into AX. By adding 5 bytes (instead of just using 'mov ax, 2503h'), this code makes the disassembler very hard to work with. Even if the instructions are disassembled correctly, the value of AX will not be known until it is placed in AX. You can hide the value from the disassembler using 'ADD AX' or 'SUB AX' wherever possible. If you check this well, you can see that any value can be entered into AX. These two values can be changed to 0FEh on the first line and 03Bh on the last.

4. Using the capabilities of setting the segment register prefix before some commands (pushf, pushfd, cld, etc.). The disassembler is unable to correctly recognize the program (db 3Eh, 2Eh, 90h = ds: cs: nop).

5. The disassembler crashes on the non-standard format of the loadable module (for example, to overlap the entire code segment of the DOS exe-file with a stack).

Outcomes
The above list of methods for countering disassemblers is incomplete, but quite sufficient to counter attempts to get a disassembled program listing.

Examples of disassembler programs
There are actually many disassemblers, but we will briefly review the most popular ones.

IDA Pro
IDA Pro is both an interactive disassembler and a debugger.
IDA allows you to turn the binary code of a program into assembler text that can be used to analyze the operation of a program.
However, it should be said that the built-in ring-3 debugger is rather primitive.
It works through the MS Debugging API (on NT) and through the ptrace library (on UNIX), making it an easy target for defense mechanisms.

There are a huge number of useful plugins for IDA, including those
supporting different scripting languages for scripting in addition to the built-in IDC.

W32DASM
An excellent disassembler, easy to use and understandable. The set of functions from the point of view of a
professional is rather limited, and in general it is time to classify it as a tool from the last century, but no ... W32DASM gives out a good listing and for beginners it is an excellent option to understand and figure out what's what. In addition, it is on it that they rely in numerous manuals for beginners.

PE Explorer
A program for viewing and editing PE-files, starting with EXE, DLL and
ActiveX controls and ending with Screensavers,
CPL control panel applets , SYS and binaries for the Windows Mobile platform.
In fact, this is not one utility, but a whole set of tools in order to see from the inside how a program or library works. Includes a header viewer, export of API calls, a resource editor, and of course a disassembler.

Conclusion
Thanks to all those who read it.
The article turned out to be long, but if you are interested in this topic, then you can read more about disassembling in the book "The Art of Disassembling" by Chris Kaspersky and

Rocco Eva.
The book is devoted to the issues and methods of disassembly, the knowledge of which will allow you to effectively protect your programs and create more optimized program codes. The ways of identifying constructions of high-level languages such as C / C ++ and Pascal are explained, various approaches to the reconstruction of algorithms are shown. Provides an overview of popular hacking tools for Windows, UNIX and Linux - debuggers, disassemblers, hexadecimal editors, API and RPC spyware, emulators. The article deals with the study of memory dumps, defense mechanisms, malicious program code - viruses and exploits. Attention is paid to countering anti-debugging techniques.

TO DOWNLOAD A BOOK

Disassembling 2021

President

Professional

Similar threads