In my introductory article in this new series, I attempted to lay out the merits of why you should study Reverse Engineering Malware. I’m hoping that you found that argument compelling enough that you have come back and are ready to dedicate yourself to this exciting discipline. I’m sure your hard work and dedication will pay off by advancing your cybersecurity career.
Let’s get started!
Reverse Engineering malware is a deep and sophisticated subject matter, hence few people actually master it. This is the primary reason why the salaries in this field are SO high. Before we proceed, we need to develop a conceptual framework and elaborate of some strategies and issues relating to reverse engineering malware. So, let’ s do that first.
What is Reversing Engineering?
Although definitions vary a bit about what exactly is reverse engineering, in this series we will trying to determine what a piece of software (malware) does even when we don’t have access to the source code (usually the case). After determining what the software does, then we will attempt to (1) either tweak it to do something slightly different or (2) re-construct it in another piece of software (malware).
Reverse Engineering Applied to Malware
Reverse engineering is used on both termini of malware development and delivery. At the developer terminus, reverse engineering is used to find vulnerabilities in operating systems and applications that the malware can exploit. In addition, the developers can use reverse engineering to find and use a module from someone else’s malware. Like all software developers, malware developers re-use useful code from others’ software. No sense in re-inventing the wheel even when doing malware development.
At the other terminus, forensic investigators and incident handlers can use reverse engineering to trace what a piece of malware does and what harm it might bring. Furthermore, reverse engineering can often give the forensic investigator a clue to the origin and attribution of the malware.
Low Level Software
In reverse engineering software, we often are working in low-level software. The source code is most often not available to us, but the low-level software always is.
Assembly Code
Assembly is the lowest level in the software chain and although we don’t have access to the source code, various tools can reduce the source code to assembly. Each instruction in any higher level language must be visible to the assembly language code. There is no magic here, each instruction must be reduced to one or more assembly instructions. In most cases, we will be working with this simple assembly code when reverse engineering.
Obviously, to be successful at reversing, we must be familiar with assembly language code. Unfortunately, there is not a single assembly language, but rather an assembly language for each type of processor (x86, x64, ARM, PPC, etc). To master reversing, we must master the assembly code of our chosen platform. In this series, we will be examining x86, x64 and ARM assembly.
Machine Code
Machine code or binary code is the code read by the CPU. Machine code and assembly are two different representations of the same thing. Machine code is simply a sequence of bits that contain instructions for the CPU.
Assembly language is simply textual representation of machine code that makes them more easily human readable (but not much more). Each assembly language command is represented by a number called the opcode, short for operation code.
Compilers
Compilers convert source code into machine code. One of the biggest challenges in the reversing process is that compilers tend to optimize the code to make it more efficient and perform better. Therefore, the same code compiled by two different compilers will actually generate slightly different machine code making our job of reversing more difficult.
The Reversing Process
The reversing process can usually be broken down into at least two types; (1) code level and (2) system level.
Code Level
When we do code level reversing, we are attempting to extract the software’s code concepts and algorithms from the machine code. This requires a solid understanding of such things as how the CPU works, how the operating system works and the process of software development. We will be using such tools as IDA Pro, SoftIce, Ollydbg, Ghidra and some others in this process.
System level
System level reversing involves running tools to obtain information about the software, inspect the program, inspect the executables, and track the program’s input and output. Most of this information will come from the operating system. We will be using such tools as SysInternals Suite, Tripwire, lsof, Wireshark, and others.
Reversing Tools
Reverse Engineering tools can be broken down to several categories. These include;
(1) System-level Tools
These tools sniff, monitor and explore the software we are examining. In most cases, they use the operating system to gather info on the malware.
(2) Disassemblers
Disassemblers take the software and generate the assembly code for the program. In this way, we can examine the inner workings of the malware without seeing the source code.
(3) Debuggers
A debugger enables us to observe a program while it is running. It enables us to set breakpoints and trace through the code.
(4) Decompilers
A decompiler attempts to take an executable and re-create the source code in a high-level language. Although imperfect due to the fact that compilers vary and omit steps for efficiency, this can still be a productive process in the reversing discipline.
Legality
The legality of reverse engineering has always been controversial. The question of legality revolves around the issue of the social and economic impact of reverse engineering. For instance, if you were to reverse engineer Microsoft’s Excel and then re-sell it, that would very likely be deemed illegal. If you are reverse engineering malware to decipher its capabilities and origins, that will likely be deemed legal.
Copyright law and the Digital Millenium Copyright Act (DMCA) are key pieces of legislation pertinent to reverse engineering. Some have claimed that creating an intermediate copy of a software program during the reverse engineering process is in itself a violation of the Copyright law. Fortunately, the courts have disagreed.
On the other hand, the DMCA protects copyright protected systems from being copied. In almost every case, circumvention of DMCA protections involves reverse engineering. We will look at a few of those ways in this course of study.
Copyright protections usually involve Digital Rights Management technology and circumvention of these systems is ALWAYS illegal even for personal use. It is illegal even to develop or make available such means to circumvent DRM.
There is an exception, however. You may reverse and circumvent copyright protection on software for the purpose of evaluating or improving the security of a computer system. It is this exception that our work falls within.
Conclusion
I hope that this introduction has given you a framework for understanding the reverse engineering malware process and has whet your appetite for what is to come. Keep coming back as I step your through the exciting process of reverse engineering malware!