Dive into assembler. Introductory article of the course.

Hacker

Professional
Messages
1,046
Reputation
9
Reaction score
757
Points
113
Dive into assembler. Complete course on programming in ASME from Hacker.
This is an idea we have been carrying for a long time. We must have stormed it from all sides for several years, and every time something got in the way. On the one hand, assembler is as cool as the ability to communicate with a computer in its language can be cool for our hacker reader (quacker, reverser). On the other hand, there are enough up - to-date guides on ASM, including this century's edition, and these are liberal times, web hackers and JS lovers may not understand or approve of us. a series of articles on reverse malware. It turned out that now, in the XXI century, the tru-crackers still haven't given up their positions, and our readers are interested in this!

INFO
This is the first (introductory) article of the course. The course is designed for those who are generally familiar with high-level programming and are just starting to learn assembly language.
But what is programming in itself, in its essence, regardless of any language? The variety of responses is amazing. The most common definition you can hear is: programming is the compilation of instructions or commands for sequential execution by a machine in order to solve a particular problem. Such an answer is quite fair, but, in my opinion, it does not reflect the fullness, as if we called literature a compilation of sentences from words for the reader to read them sequentially. I tend to believe that programming is closer to creativity, to art.
But no matter what type of programming we do, success depends on practical skills, along with knowledge of the fundamentals and theory. Theory and practice, study and work — these are the cornerstones on which success is based.
Recently, assembler has been undeservedly overshadowed by other languages. This is due to global commercialization, which is aimed at getting as much profit from the product as possible in the shortest possible time. In other words, mass character prevailed over elitism. And the assembler, in my opinion, is closer to the latter.
As for choosing an operating environment for learning assembler, if we talk about a 32-bit instruction set, the choice is relatively small. These are either Windows operating systems or members of the UNIX family.

You should also say a few words about which assembler to choose for a particular operating environment. As you know, two types of assembly language syntax are used to work with x86 processors - AT&T syntax and Intel syntax. These syntaxes represent the same commands in completely different ways. For example, the command in Intel syntax looks like this:
Code:
mov eax,ebx

The AT&T syntax will have a different look:
Code:
movl %eax,%ebx

The AT&T-type syntax is more popular in the UNIX environment, but there are no tutorials on it. It is described exclusively in the reference and technical literature. Therefore, it is logical to choose an assembler based on the Intel syntax. There are two main assemblers for UNIX systems: NASM (Netwide Assembler) and FASM (Flat Assembler). For the Windows line, FASM and MASM (Macro Assembler) from Microsoft are popular, and there was also TASM (Turbo Assembler) from Borland, which has long abandoned support for its own brainchild.

In this series of articles, we will study in a Windows environment based on the MASM assembly language (simply because I like it better). Many authors at the initial stage of learning assembly write it into the shell of the C language, based on the considerations that it is supposedly quite difficult to go to practical examples in the operating environment: you need to know both the basics of programming in it and the processor commands. However, this approach also requires at least some rudimentary knowledge of the C language.

It should be noted that when learning the basics of programming, and this applies not only to programming in assembly language, it is extremely useful to have an understanding of the culture of console applications.

What is assembler?[/V]
The word assembler itself is translated from English as "assembler". mnemonics are relatively easy to remember, since they are abbreviations from English words. in assembly language.

At the dawn of the computer age, the first computers occupied entire rooms and weighed tons, with a memory capacity the size of a sparrow's brain, or even less. compilers (more intelligent code generators from a more human-readable language) and interpreters (executors of a human-written program on the fly).

Thus, assembly language is a machine-oriented language programming interface that allows you to work with your computer directly, one - on-one. Hence its full wording — a low-level programming language of the second generation (after machine code). One-to-one assembly commands correspond to processor commands, but since there are different processor models with their own instruction set, there are also varieties or dialects of assembly language. Therefore, the use of the term "assembly language" may lead to an erroneous opinion about the existence of a single low-level language, or at least a standard for such languages.

Since assembler is just a program written by a human, there is nothing stopping another programmer from writing their own assembler, which is often the case. In fact, it doesn't really matter which assembly language to learn. The main thing is to understand the very principle of operation at the level of processor commands, and then it will not be difficult to master not only another assembler, but also any other processor with its own set of commands.

Syntax
There is no generally accepted standard for the syntax of assembly languages. However, most assembly language developers adhere to common traditional approaches. The main standards are Intel-syntax and AT&T-syntax.

The general format for recording instructions is the same for both standards:
Code:
[placemark:] opcode [operands] [; comment]

The opcode is the actual assembly instruction, the mnemonic of the instruction to the processor. Prefixes can be added to it (for example, repetitions, changes in the address type). Operands can be constants, register names, RAM addresses, and so on. The differences between the Intel and AT&T standards relate mainly to the order of enumeration of operands and their syntax for different addressing methods.

The commands used are usually the same for all processors of the same architecture or family of architectures (among the well - known ones are the commands of Motorola, ARM, and x86 processors and controllers). They are described in the processor specification.

Например, процессор Zilog Z80 inherited the Intel i8080 instruction set, expanded it, and changed some commands (and register designations) in its own way. For example, I changed the Intel mov command to ld. Motorola Fireball processors inherited the Z80 command system, cutting it down somewhat. At the same time, Motorola has officially returned to Intel commands, and at the moment half of the assemblers for Fireball work with Intel commands, and half with Zilog commands.

Directives
In addition to assembly commands, a program can contain directives - commands that do not translate directly into machine instructions, but control the operation of the compiler. Their set and syntax vary considerably and depend not on the hardware platform, but on the compiler used. As a set of directives, you can select:
  • defining data (constants and variables);
  • managing program organization in memory and output file parameters;
  • setting the compiler's operating mode;
  • all kinds of abstractions (i.e. elements of high - level languages) - from the design of procedures and functions (to simplify the implementation of parameter passing) to conditional constructs and loops;
  • macros.

Advantages and disadvantages
The advantages include the following:
  • minimal amount of redundant code (using fewer instructions and memory accesses). As a result, the program is faster and smaller in size;
  • direct access to hardware: I / O ports, special processor registers;
  • the ability to write self-modifying code (that is, the ability for an application to create or modify part of its code at runtime, and without the need for a software interpreter);
  • maximum "fit" for the desired platform (using special instructions, hardware specifications).

For shortcomings, you can take:
  • large amounts of code, a large number of additional small tasks;
  • fewer available libraries and their low compatibility;
  • poor code readability, difficulty maintaining (debugging, adding features);
  • non-portability to other platforms (except binary compatible ones).

Why should I learn assembly language?
In the modern practice of industrial programming, assembly languages are rarely used. For the development of low-level programs, almost in most cases, the C language is used, which allows achieving the same goals many times with less labor, and with the same, and sometimes even greater efficiency of the resulting executable code (the latter is achieved through the use of optimizers). Very specific parts of operating system kernels and system libraries are currently implemented in assembler.

So why waste time learning it? For a number of good reasons, and here is one of them: the assembler is the cornerstone on which the entire infinite programming space, starting from the birth of the first processor, rests. Every physicist wants to solve the mystery of the structure of the universe, to find these mysterious primary indivisible (low-level) elements of which it consists, without being satisfied with only a vague idea of quantum theory. The assembler is the primary matter that makes up the processor's universe.

In general, a professional computer user, whether a system administrator or a programmer, can afford not to know something, but in no case can afford not to understand the essence of what is happening, how a computer system works at all its levels, from electronic logic circuits to bulky application programs. And not understanding something leads to a feeling in the depths of the subconscious of a certain mystery, an incomprehensible mystery that occurs at the wave of someone's magic wand. Such a feeling is absolutely unacceptable for a professional.

In other words, as long as processors exist, assembler will be necessary.

In this respect, it doesn't matter what specific architecture or assembly language to learn. If you know one assembly language, you can easily start writing in any other language with just a little time spent studying the reference information. But the most important thing is that, being able to think in the language of the processor, you will always know what, why, why and why is happening. And this is not just a level of programming with the mouse, but a path to creating software that bears the stamp of great skill.

Assembly language - programming or art?
Let's just say it all depends on whose hands it is in. The assembler is the primary element of the processor's world, and the combination of these elements makes up its soul, its self-consciousness. Just as all music written in the history of mankind consists of combinations of seven notes, so the combination of assembler commands fills the computer world with digital life. Who knows only three chords - this is "pop", who knows the whole palette - this is a classic.

So why is science so eager to delve into the quantum depths and get its hands on the elusive primordial brick of matter? To gain power over it, to change it at will, to become on the level of the Creator of the Universe. In whose hands such power will fall - this is still a question. Unlike science, there are no secrets in the world of programming, we know the building blocks that make up it, and therefore the power over the processor that knowledge of assembler gives us.

In order for assembly language programming to rise to the level of art, you need to understand its beauty, hidden behind the flow of ones and zeros. As in any branch of human activity, in programming you can be mediocre, or you can become a Master. Both are distinguished by the degree of culture, education, work and, most importantly, how much soul the author puts into his creation.

Assembler and Terminator
Not so long ago, James Cameron released a 3D version of the second "Terminator", and as an interesting historical fact, we can note one interesting moment in the life of a cyborg killer...
Here we see the "vision" of the terminator, and the assembler listing is displayed on the left side of it. Judging by it, the famous Destroyer worked on the MOS Technology 6502 processor or on MOS Technology 6510. This processor was first developed in 1975, used on Apple computers and, among other things, on the famous video game consoles of that time Atari 2600 and Nintendo Entertainment System (better known as Dendy). It had only three 8-bit registers: A-accumulator and two index registers X and Y.

LDA - load into the battery
LDY - load in the Y register
LDX - load in the X register
STA - save from battery
STX - save from the X register
STY - save from the Y register

Reading and writing to the I / O ports were also performed by these commands, and the terminator program has a completely meaningful appearance, and does not represent a stupid fantasy of the screenwriter.
MOS Technology 6502 / Command System.

Areas of practical application
It was mentioned earlier that assembly language is now almost replaced by high-level languages. However, to this day it is still used. Here are some examples.
  • Embedded software development. These are small programs that do not require a significant amount of memory on devices such as phones, car ignition systems, security systems, video and sound cards, modems, and printers. Assembler is the perfect tool for this.
  • In computer game consoles to optimize and reduce the amount of code and for speed.
  • To use new commands available on new processors in the program. Although the high-level compiler optimizes the code during compilation, it is almost never able to generate instructions from extended instruction sets such as AVX, CTV, or XOP. Because instructions are added to processors faster than the logic for generating these commands appears in compilers.
  • A large proportion of GPU programs are written in assembly language, along with the high-level languages HLSL or GLSL.
  • For writing code that is impossible or difficult to create in high-level languages, such as getting a memory dump/stack. Even when an analog in a high-level language is possible, the advantage of assembly language can be significant. For example, the implementation of calculating the arithmetic mean of two numbers taking into account overflow for x86 processors takes only two instructions (addition with setting the carry flag and shift with the borrow flag). An analog in a high-level language ((long) x + y) >> 1or it may not work in principle, because sizeof(long) == sizeof(int), or when compiled, it is converted to a huge number of processor instructions.
  • Writing viruses and antivirus programs. The only programming language for creating decent infectors is CIH, Sality, Sinowal.
  • And of course, it is impossible not to mention the reverse side of the coin: hacking, cracking and a more legal option - reverse engineering. Knowledge of assembler is a powerful tool in the hands of a reverser. Neither disassembling nor debugging programs without knowledge of it is possible.

Instead of concluding
We will continue to dive into assembler in the next articles in the series. We have generally identified the topics of this series, but if you have any ideas or suggestions, feel free to write in the comments, we will take everything into account. :)

WWW
 
Top