Code Obfuscation

Layout Obfuscation

Aims to hide how the source code is structured

As source code (or symbols) can present enough information to help reverse a program

Applied to the source code, and focused on situations where the source can be obtained

Javascript, HTML, CSS, Java

Methods:

Deleting comments
Remove debugging information
Renaming classes, methods and variables
Removing spaces
Stripping a binary

Design Obfuscation

Aims at making the design nonobvious, and more difficult to recover

Usually done by a tool before compilation or during compilation
GCC can do this automatically by inlining functions (-O3 –finline -funroll-loops)

Methods:

Merging and splitting methods
Merging and splitting classes
Splitting binary code, while inserting dummy instructions
Splitting loops and conditions, maybe interleaved with dummy code
Inlining functions
Dead Code

Breaking Code

Code inserted, but never executed. JMP before dummy code effectively only splits code

What about the output binary? Compile with gcc -O0 -o factorial-split factorial-split.c

Does it affect static or dynamic analysis? Check with objdump -d and ghidra

What about if instead of jmp you use jz or jnz? gcc may also inline functions (the opposite) when using –O3 or –finline-functions

Dead Code

Aims at inserting dummy code to confuse the analysis

Code may follow some pattern (previous example), or be random
Code may lock the analysis tool if recursive disassembly is used
Decompilation to Pseudo C will surely be affected

Dead code can be added after compilation

May contain fingerprinting information by making binaries unique

Data Obfuscation

Encrypts, or otherwise encodes data contents

Contents are decrypted in real-time, as the program is executed
Static analysis, or fingerprint matching may fail to correctly recover useful information
Frequent tactic to evade filters

Why?

Strings frequently carry semantic information, that may help analysis
E.g. Str=“Please input your AES key”: we will know that this is a key, and know the algorithm

How?

Split the string into parts

May be combined with two conditions or loops to validate both parts individually

Erase strings right after use

Common XOR is frequently found as it requires no dependencies and is fast

More recent malware will use RC4 or even AES for this purpose
The decryption key can also be encrypted, and some keys may be obtained dynamically
- E.g. from a hardware token as a form of licensing enforcement

Create a custom encoding based on a complex state machine

May use flow information, voiding the decoding of strings if the execution order it changed

Control Obfuscation

Opaque Predicates

Introduces dummy control structures, with little impact on execution

The impact is only from a performance point of view (additional branch)
However, analysis tools will interpret the control structures and create complex CFGs

Makes use of Opaque Predicates: predicates for which the programmer already knows the result.

E.g. if ( 1 > 0) or v=r; if(v==r)

Opaque predicates can be more complex

Manipulate pointers, linked lists, use computation processes

The result of a predicate can be dynamic, and related to the execution state

Dynamic analysis may change the execution sequence, therefore the predicate result and invalidate the execution
Similar to TPMs, where keys are provided at a valid situation
Predicate can use dynamic data, received from external services

Concurrency can be used to create predicates

If two threads are executing with some relation, one can update data, that the other uses to construct a predicate
Timing information can also be used, to further increase the complexity (information not available statically)

Control Flow Flattening

Removes control flow structures from the program.

Converts the program to a gigantic Switch, where each condition is a case
The program runs on an infinite loop around the switch

The program becomes ~4 times slower, and 2 times larger

Self Decompressing Binaries

Binaries can be compressed into a blob (and even encrypted)

Stub will process the blob and jump into it

Static analysis will be able to analyze the stub, which can be obfuscated

Stub provides a valid signature for scanners, but variations can exist

The actual file is never available for analysis by static scanners

Is available at runtime, as the file must be available for execution
Generic packers (upx) will pack the entire ELF, which is mapped at runtime
- Easier to extract as a file is recreated and mapped
Crafted packers will require more effort

The generic approach uses a debugger or Qiling to dump the uncompressed file

For an overview, check: https://kernemporium.github.io/posts/unpacking/

Self Decompressing Binaries

PreviousContent Type Obfuscation NextComunicação paralelo

Last updated 3 months ago