Code Obfuscation
Last updated
Last updated
Aims to hide how the source code is structured
As source code (or symbols) can present enough information to help reverse a program
Applied to the source code, and focused on situations where the source can be obtained
Javascript, HTML, CSS, Java
Methods:
Deleting comments
Remove debugging information
Renaming classes, methods and variables
Removing spaces
Stripping a binary
Aims at making the design nonobvious, and more difficult to recover
Usually done by a tool before compilation or during compilation
GCC can do this automatically by inlining functions (-O3 –finline -funroll-loops
)
Methods:
Merging and splitting methods
Merging and splitting classes
Splitting binary code, while inserting dummy instructions
Splitting loops and conditions, maybe interleaved with dummy code
Inlining functions
Dead Code
Code inserted, but never executed. JMP before dummy code effectively only splits code
What about the output binary? Compile with gcc -O0 -o factorial-split factorial-split.c
Does it affect static or dynamic analysis? Check with objdump -d
and ghidra
What about if instead of jmp you use jz or jnz? gcc
may also inline functions (the opposite) when using –O3
or –finline-functions
Aims at inserting dummy code to confuse the analysis
Code may follow some pattern (previous example), or be random
Code may lock the analysis tool if recursive disassembly is used
Decompilation to Pseudo C will surely be affected
Dead code can be added after compilation
May contain fingerprinting information by making binaries unique
Encrypts, or otherwise encodes data contents
Contents are decrypted in real-time, as the program is executed
Static analysis, or fingerprint matching may fail to correctly recover useful information
Frequent tactic to evade filters
Why?
Strings frequently carry semantic information, that may help analysis
E.g. Str=“Please input your AES key”: we will know that this is a key, and know the algorithm
Split the string into parts
May be combined with two conditions or loops to validate both parts individually
Erase strings right after use
Common XOR is frequently found as it requires no dependencies and is fast
More recent malware will use RC4 or even AES for this purpose
The decryption key can also be encrypted, and some keys may be obtained dynamically
E.g. from a hardware token as a form of licensing enforcement
Create a custom encoding based on a complex state machine
May use flow information, voiding the decoding of strings if the execution order it changed
Introduces dummy control structures, with little impact on execution
The impact is only from a performance point of view (additional branch)
However, analysis tools will interpret the control structures and create complex CFGs
Makes use of Opaque Predicates: predicates for which the programmer already knows the result.
E.g. if ( 1 > 0) or v=r; if(v==r)
Opaque predicates can be more complex
Manipulate pointers, linked lists, use computation processes
The result of a predicate can be dynamic, and related to the execution state
Dynamic analysis may change the execution sequence, therefore the predicate result and invalidate the execution
Similar to TPMs, where keys are provided at a valid situation
Predicate can use dynamic data, received from external services
Concurrency can be used to create predicates
If two threads are executing with some relation, one can update data, that the other uses to construct a predicate
Timing information can also be used, to further increase the complexity (information not available statically)
Removes control flow structures from the program.
Converts the program to a gigantic Switch, where each condition is a case
The program runs on an infinite loop around the switch
The program becomes ~4 times slower, and 2 times larger
Binaries can be compressed into a blob (and even encrypted)
Stub will process the blob and jump into it
Static analysis will be able to analyze the stub, which can be obfuscated
Stub provides a valid signature for scanners, but variations can exist
The actual file is never available for analysis by static scanners
Is available at runtime, as the file must be available for execution
Generic packers (upx) will pack the entire ELF, which is mapped at runtime
Easier to extract as a file is recreated and mapped
Crafted packers will require more effort
The generic approach uses a debugger or Qiling to dump the uncompressed file
For an overview, check: https://kernemporium.github.io/posts/unpacking/