Reverse Engineering Malware, Part 2: Assembler Language Basics

Malware

Most of the work we will be doing in reverse engineering will be with assembler language. This simple and sometimes tedious language can reveal a plethora of information on the source code. When we can’t see or recover the source code of the malware or other software, we can use tools such as dis-assemblers and debuggers to recover the underlying assembler of the software. From there, of course, we can then decipher what the software was attempting to do.

In this tutorial, I will simply be listing the most basic and fundamental assembler instructions. I suspect most of you will simply use it a a reference as we progress though this study, so make certain to bookmark this page so that you can easily come back to it.

Pieces

Let’s begin some every basic concepts. Hopefully, this all review for you, but if not, you need to understand these basic concepts before proceeding in this course of study.

Bit – This is the smallest piece of data. It can be a 0 or 1 or Off or ON.

Byte – a byte is 8 bits. It has a range of equivalent decimal values of 0 to 255

Word – a word is two bytes together or 16 bits

Double Word – a double word is tow words or 32 bits

Kilobyte – a kilobyte is 1024 (32 * 32) bytes

Megabyte – a megabyte is is 1,048,578 bytes (1024 x 1024).

Registers

Registers are places in computer memory where data is stored. When working in the assembler, we are usually using these registers to move and manipulate information, so you should be familiar with them.

These registers are;

EAX – Extended Accumulator Register

EBX – Extended Base Register

ECX – Extended Counter Register

EDX – Extended Data Register

ESI – Extended Source Index

EDI – Extended Destination Index

EBP – Extended Base Pointer

ESP – Extended Stack Pointer

EIP – Extended Instruction Pointer

Flags

Flags are a single bit that indicates status of a register. The flag register on modern 32 bit CPU’s is 32 bits long. There are 32 flags. In our studies here, we will only need three of them; (1) the Z flag, the O flag and the C flag.

A flag can only be SET or NOT SET

Z-Flag

The Z-flag (zero flag) is the most useful flag for cracking. It is used in about 90% of all cases. It can be set or cleared by several opcodes when the last instruction that was performed has 0 as a result

O-Flag

The O-flag (overflow flag) is used in about 4% of all cracking attempts. It is set when the last operation changed the highest bit of the register that gets the result of an operation.

C-Flag

The C-Flag (carry Flag) is used in about 1% of all cracking attempts. It is set, if you add a value to a register, so that it gets bigger than FFFFFFFF or is you subtract a value so that the register value is less than zero.

Stack

The stack is a part of memory where you can store different things for later use. Like a stack of books on a desk where the last on top (last in or LI) is the first to leave (LIFO).

The command PUSH saves the contents of a register on the stack. The command POP grabs the last saved contents of a register from the stack and then places it into a specific register.

Instructions

Assembler language has a small number of fundamental commands. These include;

ADD – The ADD instruction adds a value to a register or memory address.

Syntax:

ADD destination, source

AND – the AND instruction uses a logical and on two values

Syntax:

AND destination, source

CALL – the CALL instruction pushes the Relative Virtual Address (RVA) of the instruction that follows to the stack and calls a subprogram or sub-procedure

Syntax:

CALL something

CDQ – Convert DWORD to QWORD (Convert D to Q)

Syntax:

CDQ

CMP – Compare

the CMP instruction compares two things and can set the C/O/Z flags if the result of the compare fits

Syntax:

CMP destination, source

DEC – Decrement

the decrement command is used to decrease a value

decreases a value (value= value -1 )

Syntax:

DEC something

DIV – Division

the DIV command is used to divide EAX through a divisor. The dividend is always EAX, the result is stored in EAX and the modulus is stored in EDX.

Syntax:

DIV divisor

IDIV – Integer division. Signed division and may set C/O/Z flags

Syntax:

IDIV divisor

IMUL – integer multiplication

Syntax:

IMUL value

IMUL dest, value, value

IMUL dest, value

INC – increment, opposite of DEC instruction (value = value +1)

Syntax:

INC register

INT – the INT command generates a call to an interrupt handler

JUMPS – there are a variety of jumps, but the most common and important jumps are;

JE – jump if equal

JG – jump if greater

JGE – jump if greater or equal

JL – jump if lesser

JLE – jump if less or equal

JMP – jump always

JNE – jump if not equal

JNZ – jump if not zero

JZ – jump if zero

LEA – Load Effective Address

Syntax:

LEA destination, source

MOV – move copies the value from the source to the destination

Syntax:

MOV destination, source

MUL – multiply is the same as IMUL but it multiplies unsigned

Syntax:

MUL value

NOP – no operation does nothing

Syntax:

NOP

OR – logical inclusive OR

Syntax:

OR destination, source

POP – the POP instruction loads the value of the byte/word/dword pointer (ESP) and puts it into the destination.

Syntax:

POP destination

PUSH – the PUSH instruction stores a value on the stack and decreases it by the size of the operand that was pushed, so that the ESP points to the value that was PUSHed.

Syntax:

PUSH operand

REP – repeat following string instruction. Common uses are REPE(repeat if equal), REPZ (repeat if zero), REPNE (repeat if nonequal), and REPNZ (repeat if non-zero)

Syntax:

REP ins

Where ins is a string operation

RET – return

Syntax:

RET digit

SUB – subtraction. Is the opposite of ADD command. Subtracts the value of the source from the value of destination and stores the result in destination

Syntax:

SUB destination, source

TEST it performs a logical AND but does not store the value

Syntax:

TEST operand1 , operand2

XOR – the XOR instruction connects two values using logical exclusive OR

Syntax:

XOR destination, source

Logical Operations

The table below summarizes the logical operations displaying the results of AND, OR, NOT and XOR when the source or destination is a 1 or 0.