Software Deobfuscation Techniques

This class is intended for students who have basic experience in reverse engineering and have to deal with obfuscated code. Furthermore, the course is also interesting for experienced reverse engineers who aim to deepen their understanding in program analysis techniques and code (de)obfuscation.

$4,299.00

Duration

4 days

Delivery Method

in-person

Level

advanced

Seats Available

20

Duration

4 days

Delivery Method

in-person

Level

advanced

REGISTRATION CLOSED

DATE: 9-12 May 2022

TIME: 09:00 to 17:00 CEST/GMT+2

Date Day Time Duration
∆9 May Monday 0900-17:00 CEST/GMT+2 8 Hours
10 May Tuesday 0900-17:00 CEST/GMT+2 8 Hours
11 May Wednesday 0900-17:00 CEST/GMT+2 8 Hours
12 May Thursday 0900-17:00 CEST/GMT+2 8 Hours

 


Code obfuscation has become a vital tool to protect, for example, intellectual property against competitors. In general, it attempts to impede program understanding by making the to-be-protected program more complex. As a consequence, a human analyst reasoning about the obfuscated code has to overcome this barrier by transforming it into a representation that is easier to understand.

In this training, we get to know state-of-the-art code obfuscation techniques and look at how these complicate reverse engineering. Afterwards, we gradually familiarize ourselves with different deobfuscation techniques and use them to break obfuscation schemes in hands-on sessions. Thereby, participants will deepen their knowledge of program analysis and learn when and how (not) to use different techniques.

First, we have a look at important code obfuscation techniques and discuss how to attack them. Afterwards, we analyze a virtual machine-based (VM-based) obfuscation scheme, learn about VM hardening techniques and how to tackle them.

In the second part, we cover SMT-based program analysis. In detail, students learn how to solve program analysis problems with SMT solvers, how to prove characteristics of code, how to deobfuscate mixed Boolean-Arithmetic and how to break weak cryptography.

Before we use symbolic execution to automate large parts of code deobfuscation, we first introduce intermediate languages and compiler optimizations to simplify industrial-grade obfuscation schemes. Following, we use symbolic execution to automate SMT-based program analysis and break opaque predicates. Finally, we learn how to write disassemblers for virtualization-based obfuscators and how to reconstruct the original code.

The last part covers program synthesis, an approach to simplify code based on its semantic behavior. After collecting input-output pairs from binary code, we not only learn how to simplify large expression trees, but also how we can verify the correctness of simplifications. Then, we use program synthesis to deobfuscate mixed Boolean-Arithmetic and learn the semantics of VM instruction handlers.

**Note that the training focuses on hands-on sessions. While some lecture parts provide an understanding of when to use which method, various hands-on sessions teach how to use them to build custom purpose tools for one-off problems. The trainer actively supports the students to successfully solve the given tasks. After a task is completed, we discuss different solutions in class. Furthermore, students receive detailed reference solutions that can be used during and after the course.

While the hands-on sessions use x86 assembly, all tools and techniques can also be applied to other architectures such as MIPS, PPC or ARM.

 

Topics Covered
  • Introduction to Code (De)obfuscation
    • Motivation
    • Application Scenarios
    • Program Analysis Techniques
  • Code Obfuscation Techniques
    • Opaque Predicates
    • Control-flow Flattening
    • Mixed Boolean-Arithmetic
    • Virtual Machines
    • Virtual Machine Hardening
  • Code Deobfuscation Techniques
    • Compiler Optimizations
    • Reconstructing Control Flow
    • SMT-based Program Analysis
    • Taint Analysis
    • Symbolic Execution
    • Program Synthesis
  • Compiler Optimizations
    • Dead Code Elimination
    • Constant Propagation/Folding
    • Static Single Assignment (SSA)
    • Optimizing Obfuscated Code
  • SMT-based Program Analysis
    • SAT and SMT Solvers
    • Encoding Programs Analysis Problems for SMT Solvers
    • Proving Semantic Equivalence
    • Proving Properties of a Piece of Code
    • Solving Complex Program Constraints
    • Deobfuscating Mixed Boolean-Arithmetic
    • Breaking Weak Cryptography
  • Symbolic Execution
    • Intermediate Languages for Reverse Engineering
    • Symbolic and Semantic Simplification of Obfuscated Code
    • Automation in Reverse Engineering
    • Identifying Virtual Machine Components
    • Interaction With SMT Solvers
    • Breaking Opaque Predicates
    • Writing Disassemblers for Virtualization-based Obfuscators
  • Program Synthesis
    • Concept of Program Synthesis
    • Learning Code Semantics Based on its Input/Output Behavior
    • Obtaining Input/Output Pairs from Code
    • Methods to Simplify Large Expression Trees
    • Proving the Correctness of Simplifications
    • Deobfuscating mixed Boolean-Arithmetic
    • Learning Semantics of VM Instruction Handlers

Agenda

  • Introduction to Code (De)obfuscation

    - Motivation - Application Scenarios - Program Analysis Techniques

  • Code Obfuscation Techniques

    - Opaque Predicates - Control-flow Flattening - Mixed Boolean-Arithmetic - Virtual Machines - Virtual Machine Hardening

  • Code Deobfuscation Techniques

    - Compiler Optimizations - Reconstructing Control Flow - SMT-based Program Analysis - Taint Analysis - Symbolic Execution - Program Synthesis

  • Compiler Optimizations

    - Dead Code Elimination - Constant Propagation/Folding - Static Single Assignment (SSA) - Optimizing Obfuscated Code

  • SMT-based Program Analysis

    - SAT and SMT Solvers - Encoding Programs Analysis Problems for SMT Solvers - Proving Semantic Equivalence - Proving Properties of a Piece of Code - Solving Complex Program Constraints - Deobfuscating Mixed Boolean-Arithmetic - Breaking Weak Cryptography

  • Symbolic Execution

    - Intermediate Languages for Reverse Engineering - Symbolic and Semantic Simplification of Obfuscated Code - Automation in Reverse Engineering - Identifying Virtual Machine Components - Interaction With SMT Solvers - Breaking Opaque Predicates - Writing Disassemblers for Virtualization-based Obfuscators

  • Program Synthesis

    - Concept of Program Synthesis - Learning Code Semantics Based on its Input/Output Behavior - Obtaining Input/Output Pairs from Code - Methods to Simplify Large Expression Trees - Proving the Correctness of Simplifications - Deobfuscating mixed Boolean-Arithmetic - Learning Semantics of VM Instruction Handlers

Why You Should Take This Course

Get to know state-of-the-art code obfuscation techniques and look at how these complicate reverse engineering. Then gradually familiarize yourselves with different deobfuscation techniques and use them to break obfuscation schemes in hands-on sessions. This way, learn to deepen your knowledge of program analysis and learn when and how (not) to use different techniques.

Who Should Attend

This class is intended for students who have basic experience in reverse engineering and have to deal with obfuscated code. Furthermore, the course is also interesting for experienced reverse engineers who aim to deepen their understanding in program analysis techniques and code (de)obfuscation.

Key Learning Objectives

  • Get to know the state-of-the-art of code obfuscation and deobfuscation techniques

  • Learn compiler optimizations, SMT-based program analysis, symbolic execution and program synthesis

  • Apply all techniques to break obfuscation schemes in various hands-on sessions

  • Write disassemblers for VM-based obfuscators and simplify complex arithmetic expressions
  • Prerequisite Knowledge

    • Basic reverse engineering skills
    • Familiarity with x86 assembly and Python

    Hardware / Software Requirements

    Students should have access to a computer with 4 GB RAM (minimum) and at least 20 GB disk space. Furthermore, they should install a disassembler of their choice (e.g., IDA or Ghidra) as well as virtualization software such as Virtual Box or VMware. Students will be provided with a Linux VM containing all necessary tools and setups.

    Your Instructor

    Tim Blazytko @mr_phrazer is a well-known binary security researcher and co-founder of emproof. After working on novel methods for code deobfuscation, fuzzing and root cause analysis during his PhD, Tim now builds code obfuscation schemes tailored to embedded devices. Moreover, he gives trainings on reverse engineering & code deobfuscation, analyzes malware and performs security audits.

    What students say about this training:

    From Tim’s past HITB training

    • Trainer’s Overall Score: 96%
    • Trainees’ Overall Feedback:

    Would you recommend this class, or attend other classes by this trainer?
    “I would absolutely recommend others take this class or any other classes taught by Tim”

    “Yes, I would definitely recommend this class to any reverse engineers wanting to advance their skills, and I would attend other classes by this trainer.”

    “Absolutely recommend this class. It has met and exceed all my expectations!”

    What part of this course did you find most useful and interesting?
    “I found all of it very interesting. The most useful parts to me were the coding/reversing exercises. That really helps to cement my understanding of the topics discussed”

    “The latter part, dealing with the automation of analysis, [where we were] applying the theory of techniques covered earlier on”

    “It is very difficult to fault any component of this course, its appears as a very mature and well refined project. Tim is clearly very passionate on the subjects and that is portrayed through the material and delivery.”