Code Obfuscation

Layout Obfuscation

Aims to hide how the source code is structured

  • As source code (or symbols) can present enough information to help reverse a program

Applied to the source code, and focused on situations where the source can be obtained

  • Javascript, HTML, CSS, Java

Methods:

  • Deleting comments

  • Remove debugging information

  • Renaming classes, methods and variables

  • Removing spaces

  • Stripping a binary

Design Obfuscation

Aims at making the design nonobvious, and more difficult to recover

  • Usually done by a tool before compilation or during compilation

  • GCC can do this automatically by inlining functions (-O3 –finline -funroll-loops)

Methods:

  • Merging and splitting methods

  • Merging and splitting classes

  • Splitting binary code, while inserting dummy instructions

  • Splitting loops and conditions, maybe interleaved with dummy code

  • Inlining functions

  • Dead Code

Breaking Code

Code inserted, but never executed. JMP before dummy code effectively only splits code

What about the output binary? Compile with gcc -O0 -o factorial-split factorial-split.c

Does it affect static or dynamic analysis? Check with objdump -d and ghidra

What about if instead of jmp you use jz or jnz? gcc may also inline functions (the opposite) when using –O3 or –finline-functions

Dead Code

Aims at inserting dummy code to confuse the analysis

  • Code may follow some pattern (previous example), or be random

  • Code may lock the analysis tool if recursive disassembly is used

  • Decompilation to Pseudo C will surely be affected

Dead code can be added after compilation

  • May contain fingerprinting information by making binaries unique

Data Obfuscation

Encrypts, or otherwise encodes data contents

  • Contents are decrypted in real-time, as the program is executed

  • Static analysis, or fingerprint matching may fail to correctly recover useful information

  • Frequent tactic to evade filters

Why?

  • Strings frequently carry semantic information, that may help analysis

  • E.g. Str=“Please input your AES key”: we will know that this is a key, and know the algorithm

How?

Split the string into parts

  • May be combined with two conditions or loops to validate both parts individually

Erase strings right after use

Common XOR is frequently found as it requires no dependencies and is fast

  • More recent malware will use RC4 or even AES for this purpose

  • The decryption key can also be encrypted, and some keys may be obtained dynamically

    • E.g. from a hardware token as a form of licensing enforcement

Create a custom encoding based on a complex state machine

  • May use flow information, voiding the decoding of strings if the execution order it changed

Control Obfuscation

Opaque Predicates

Introduces dummy control structures, with little impact on execution

  • The impact is only from a performance point of view (additional branch)

  • However, analysis tools will interpret the control structures and create complex CFGs

Makes use of Opaque Predicates: predicates for which the programmer already knows the result.

  • E.g. if ( 1 > 0) or v=r; if(v==r)

Opaque predicates can be more complex

Manipulate pointers, linked lists, use computation processes

The result of a predicate can be dynamic, and related to the execution state

  • Dynamic analysis may change the execution sequence, therefore the predicate result and invalidate the execution

  • Similar to TPMs, where keys are provided at a valid situation

  • Predicate can use dynamic data, received from external services

Concurrency can be used to create predicates

  • If two threads are executing with some relation, one can update data, that the other uses to construct a predicate

  • Timing information can also be used, to further increase the complexity (information not available statically)

Control Flow Flattening

Removes control flow structures from the program.

  • Converts the program to a gigantic Switch, where each condition is a case

  • The program runs on an infinite loop around the switch

The program becomes ~4 times slower, and 2 times larger

Self Decompressing Binaries

Binaries can be compressed into a blob (and even encrypted)

  • Stub will process the blob and jump into it

Static analysis will be able to analyze the stub, which can be obfuscated

  • Stub provides a valid signature for scanners, but variations can exist

The actual file is never available for analysis by static scanners

  • Is available at runtime, as the file must be available for execution

  • Generic packers (upx) will pack the entire ELF, which is mapped at runtime

    • Easier to extract as a file is recreated and mapped

  • Crafted packers will require more effort

The generic approach uses a debugger or Qiling to dump the uncompressed file

  • For an overview, check: https://kernemporium.github.io/posts/unpacking/

Self Decompressing Binaries

Last updated