Welcome back to my Reverse Engineering Malware series.
In general, reverse engineering of malware is done on Windows systems. That’s because despite recent inroads by Linux and the Mac OS, Windows systems still comprise over 90% of all computing systems in the world. As such, well over 90% of malware is designed to compromise Windows system. For this reason, it makes sense to focus our attention to Windows operating systems.
When reversing malware, the operating system plays a key role. All applications interact with the operating system and are tightly integrated with the OS. We can gather a significant amount of information on the malware by probing the interface between the OS and the application (malware).
To understand how malware can use and manipulate Windows then, we need to better understand the inner workings of the Windows operating system. In this article, we will examine the inner workings or Windows 32-bit systems so that we can better understand how malware can use the operating system for its malicious purposes.
Windows internals could fill several textbooks (and has), so I will attempt to just cover the most important topics and only in a cursory way. I hope to leave you with enough information though, that you can effectively reverse the malware in the following articles.
Virtual Memory
Virtual memory is the idea that instead of software directly accessing the physical memory, the CPU and the operating system create an invisible layer between the software and the physical memory.
The OS creates a table that the CPU consults called the page table that directs the process to the location of the physical memory that it should use.
Processors divide memory into pages
Pages are fixed sized chunks of memory. Each entry in the page table references one page of memory. In general, 32 -bit processors use 4k sized pages with some exceptions.
Kernel v User Mode
Having a page table enables the processor to enforce rules on how memory will be accessed. For instance, page table entries often have flags that determine whether the page can be accessed from a non-privileged mode (user mode).
In this way, the operating system’s code can reside inside the process’s address space without concern that it will be accessed by non-privileged processes. This protects the operating system’s sensitive data.
This distinction between privileged vs. non-privileged mode becomes kernel (privileged) and non-privileged (user) modes.
Kernel memory Space
The kernel reserves 2gb of address space for itself. This address space contains all the kernel code, including the kernel itself and any other kernel components such as device drivers.
Paging
Paging is the process where memory regions are temporarily flushed to the hard drive when they have not been used recently. The processor tracks the time since a page of memory was last used and the oldest is flushed. Obviously, physical memory is faster and more expensive than space on the hard drive.
The windows operating system tracks when a page was last accessed and then uses that information to locate pages that haven’t been accessed in a while. Windows then flushes their content to a file. The contents of the flushed pages can then be discarded and the space used by other information. When the operating system needs to access these flushed pages, a page fault will be generated and then system then does that the information has “paged out” to a file. Then, the operating system will access the page file and pull the information back into memory to be used.
Objects and Handles
The Windows kernel manages objects using a centralized object manager component. This object manager is responsible for all kernel objects such as sections, files, and device objects, synchronization objects, processes and threads. It ONLY manages kernel objects.
GUI-related objects are managed by separate object managers that are implemented inside WIN32K.SYS.
Kernel code typically accesses objects using direct pointers to the object data structures. Applications use handles for accessing individual objects.
Handles
A handle is process specific numeric identifier which is an index into the processes private handle table. Each entry in the handle table contains a pointer to the underlying object, which is how the system associates handles with objects. Each handle entry also contains an access mask that determines which types of operations that can be performed on the object using this specific handle.
Processes
A process is really just an isolated memory address space that is used to run a program. Address spaces are created for every program to make sure that each program runs in its own address space without colliding with other processes. Inside a processes’ address space the system can load code modules, but must have at least one thread running to do so.
Process Initialization
The creation of the process object and the new address space is the first step. When a new process calls the Win32 API CreateProcess, the API creates a process object and allocates a new memory address space for the process.
CreateProcess maps NTDLL.DLL and the program executable (the .exe file) into the newly created address space. CreateProcess creates the process’s first thread and allocates stack space it. The processes first thread is resumed and starts running in the LdrpInitialization function inside NTDLL.DLL
LdrpInitialization recursively traverses the primary executable’s import tables and maps them to memory every executable that is required.
At this point, control passes into LdrpRunInitializeRoutines, which is an internal NTDLL routine responsible for initializing all statically linked DLL’s currently loaded into the address space. The initialization process consists of a link each DLL’s entry point with the DLL_PROCESS_ATTACH constant. Once all the DLL’s are initialized, LdrpInitialize calls the thread’s real initialization routine, which is the BaseProcessStart function from KERNELL32.DLL. This function in turn calls the executable’s WinMain entry point, at which point the process has completed it’s initialization sequence.
Threads
At ant given moment, each processor in the system is running one thread. Instead of continuing to run a single piece of code until it completes, Windows can decide to interrupt a running thread at given given time and switch to execution of another thread.
A thread is a data structure that has a CONTEXT data structure. This CONTEXT includes;
(1) the state of the processor when the thread last ran
(2) one or two memory blocks that are used for stack space
(3) stack space is used to save off current state of thread when context switched
(4) components that manage threads in windows are the scheduler and the dispatcher
(5) Deciding which thread get s to run for how long and perform context switch
Context Switch
Context switch is the thread interruption. In some cases, threads just give up the CPU on their own and the kernel doesn’t have to interrupt. Every thread is assigned a quantum, which quantifies has long the the thread can run without interruption. Once the quantum expires, the thread is interrupted and other threads are allowed to run. This entire process is transparent to thread. The kernel then stores the state of the CPU registers before suspending and then restores that register state when the thread is resumed.
Win32 API
An API is a set of functions that the operating system makes available to application programs for communicating with the OS. The Win32 API is a large set of functions that make up the official low-level programming interface for Windows applications. The MFC is a common interface to the Win32 API.
The three main components of the Win 32 API are;
(1) Kernel or Base API’s: These are the non GUI related services such as I/O, memory, object and process an d thread management
(2) GDI API’s : these include low-level graphics services such a s those for drawing a line, displaying bitmap, etc.
(3) USER API’s : these are the higher level GUI-related services such as window management, menus, dialog boxes, user-interface controls.
System Calls
A system call is when a user mode code needs to cal a kernel mode function. This usually happens when an application calls an operating system API. User mode code invokes a special CPU instruction that tells the processor to switch to its privileged mode and call a dispatch routine. This dispatch routine then calls the specific system function requested from user mode.
PE Format
The Windows executable format is a PE (Portable Executable). The term “portable” refers to format’s versatility in numerous environments and architectures.
Executable files are relocatable. This means that they could be loaded at a different virtual address each time they are loaded. An executable must coexist with other executables that are loaded in the same memory address. Other than the main executable, every program has a certain number of additional executables loaded into its address space regardless of whether it has DLL’s of its own or not.
Relocation Issues
If two excutables attempt to be loaded into the same virtual space, one must be relocated to another virtual space. each executable is module is assigned a base address and if something is already there, it must be relocated.
There are never absolute memory addresses in executable headers, those only exist in the code. To make this work, whenever there is a pointer inside the executable header, it is always a relative virtual address (RVA). Think of this as simply an offset. When the file is loaded, it is assigned a virtual address and the loaded calculates real virtual addresses out of RVA’s by adding the modules base address to an RVA.
Image Sections
An executable section is divided into individual sections in which the file’s contents are stored. Sections are needed because different areas in the file are treated differently by the memory manager when a module is loaded. This division takes place in the code section (also called text) containing the executable’s code and a data section containing the executable’s data.
When loaded, the memory manager sets the access rights on memory pages in the different sections based on their settings in the section header.
Section Alignment
Individual sections often have different access settings defined in the executable header. The memory manager must apply these access settings when an executable image is loaded. Sections must typically be page aligned when an executable is loaded into memory. It would take extra space on disk to page align sections on disk. Therefore, the PE header has two different kinds of alignment fields, section alignment and file alignment.
DLL’s
DLL’s allow a program to be broken into more than one executable file. In this way, overall memory consumption is reduced, executables are not loaded until features they implement are required. Individual components can be replaced or upgraded to modify or improve a certain aspect of the program.
DLL’s can dramatically reduce overall system memory consumption because the system can detect that a certain executable has been loaded into more than one address space, then map it into each address space instead of reloading it into a new memory location. DLL’s are different from static libraries (.lib) which linked to the executable.
Loading DLL’s
Static Linking is implemented by having each module list the the modules it uses and the functions it calls within each module. This is known as an import table (see IDA Pro tutorial). Run time linking refers to a different process whereby an executable can decide to load another executable in runtime and call a function from that executable.
PE Headers
A Portable Executable (PE) file starts with a DOS header.
“This program cannot be run in DOS mode”
typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAFE_FILE_HEADER Fileheader;
IMAGE_OPTIONAL_HEADER32 OptionHeader;
} Image_NT_HEADERS32, *PIMAGE_NT_HEADERS32
This data structure references two data structures that contain the actual PE header.
Imports and Exports
Imports and Exports are the mechanisms that enable the dynamic linking process of executables. The compiler has no idea of the actual addresses of the imported functions, only in runtime will these addresses be known. To solve this issue, the linker creates a import table that lists all the functions imported by the current module by their names.