In this training, we learn the fundamentals of reverse engineering from scratch, ranging from reconstructing high-level code over recovering complex data structures and C++ class hierarchies to analyzing complex malware samples. In between, we become proficient in using state-of-the-art tools such as IDA, Ghidra, and GDB. This way, the training accompanies students in their first reverse engineering steps and paves their way for a long journey.
First, we discuss the layers between machine code and high-level languages, introduce binary file formats and get to know important tools such as hex editors, disassemblers, decompilers, and debuggers. Afterward, we familiarize ourselves with the X86-64 instruction set architecture, the most common architecture on desktop computers and servers. Thereby, we learn how to manually write assembly code, inspect registers and flags in a debugger, and reconstruct arithmetic calculations and loops in a disassembler.
In the second part, we cover the reconstruction of high-level code constructs from machine code. For this, we compile C code to machine code and compare them side-by-side. Using different compilers and optimization levels, we are able to study the manifold representations of high-level constructs. Afterward, we focus on manually recovering high-level functions from compiler-generated code. Finally, we dive into the area of software cracking and deepen our skills by reverse engineering and patching serial validation schemes.
Before we reconstruct complex data structures and C++ classes with Ghidra, we first learn how to identify them manually. Following, we have a look at how to recover class inheritance relationships, analyze constructors & virtual functions, and how to dissolve virtual function calls.
Finally, we put our obtained knowledge into practice by analyzing nation-state malware samples. After discussing challenges and strategies when dealing with complex binaries, we identify malware functionality based on API functions and reconstruct class hierarchies of malware modules. In order to reveal hidden strings in the binary, we script Ghidra to automatically decrypt them.
**Note that the training focuses on hands-on sessions. While some lecture parts provide an understanding of how high-level code can be represented in machine code, various hands-on sessions teach how to interact with reverse engineering tools and reconstruct high-level code from binary programs. The trainer actively supports the students to successfully solve the given exercises. After a task is completed, we discuss different solutions in class. Furthermore, students receive detailed reference solutions that can be used during and after the course.
While this class mostly focuses on the X86-64 architecture, we can optionally take a look at the ARM32 architecture and discuss their differences and similarities. Since the course teaches reverse engineering in a general way, students will notice that all techniques and tools can also be applied to other architectures.
The training orientates at the following outline:
- Motivation - Application scenarios - From machine code to high-level languages - Compilers - Executable file formats (ELF & PE) - Static and dynamic program analysis - Editing ELF files with a hex editor - Disassembling with IDA - Decompilation with Ghidra - Debugging with GDB
- Architecture overview - Register and data types - Arithmetic operations and control-flow instructions - Stack operations and function invocations - Inspection of registers and flags with GDB - Implementation of arithmetic operations in assembly code - Reconstruction of simple calculations - Loop reconstruction with IDA
- Inspection of empty functions on the binary level - Stack frame analysis with GDB - Prologue and epilogue identification with IDA and GDB - Calling conventions - Basic blocks and control-flow graphs - Reconstruction of function signatures and arguments - Reconstruction of recursive functions - Reconstruction of (nested) conditionals/switch case - Reconstruction of (nested) loops - Impact of compiler optimizations
- Software license checks and keygenning - Analysis of serial validation schemes with IDA/Ghidra and GDB - Patching to manipulate control flow
- Local and global data structures - Variables, arrays, strings and structs - Reconstruction of arrays with IDA/Ghidra - Reconstruction of structs with IDA/Ghidra
- Function overloading and name mangling - Class objects and object life cycles - Identification and reconstruction of class objects - Reconstruction of class relationships/inheritance - Static/dynamic dispatching - Virtual functions and class inheritance - Identification and analysis of virtual function tables - Dissolving virtual function calls
- Malware types and behavior - Analysis challenges and strategies - Identification of malware functionality based on API functions - Class reconstruction of C++ malware with Ghidra - Ghidra scripting for automated string decryption
- Architecture overview - Differences to X86-64 - Register and data types - Stack operations - Arithmetic operations and control-flow instructions - Subroutines and calling convention
Tim Blazytko @mr_phrazer is a well-known binary security researcher and co-founder of emproof. After working on novel methods for code deobfuscation, fuzzing and root cause analysis during his PhD, Tim now builds code obfuscation schemes tailored to embedded devices. Moreover, he gives trainings on reverse engineering & code deobfuscation, analyzes malware and performs security audits.
What students say about this training:
From Tim’s past HITB training
Would you recommend this class, or attend other classes by this trainer?
“I would absolutely recommend others take this class or any other classes taught by Tim”
“Yes, I would definitely recommend this class to any reverse engineers wanting to advance their skills, and I would attend other classes by this trainer.”
“Absolutely recommend this class. It has met and exceed all my expectations!”
What part of this course did you find most useful and interesting?
“I found all of it very interesting. The most useful parts to me were the coding/reversing exercises. That really helps to cement my understanding of the topics discussed”
“The latter part, dealing with the automation of analysis, [where we were] applying the theory of techniques covered earlier on”
“It is very difficult to fault any component of this course, its appears as a very mature and well refined project. Tim is clearly very passionate on the subjects and that is portrayed through the material and delivery.”