Low-level languages
Last updated
Last updated
Each CPU has a specific instruction set.
Associated with rules regarding structure, and execution flow.
When a program is compiled to “binary”, the high-level logic is converted to a sequence of instructions.
This sequence may be executed by a family of CPUs or a single model.
Running this sequence on another CPU may involve binary translation (conversion).
Humans are typically not capable of reading binary instructions, but instructions are always able to be translated into Assembly.
Good: We can read binary code.
Bad: each CPU has a specific variant of Assembly. Also, assembly is not simple.
For compiled programs, the RE tasks involve extracting information from the sequence of Assembly instructions.
Disassembly is automatic, the rest frequently it isn’t.
Reconstruction is never perfect!
Different levels of abstraction: e.g., it is not trivial to recover C++ class structure and OOP relations from Assembly code.
Different compilers generate different assembly for the same source code.
The same compiler may generate different assembly for the same source code.
Optimization flags, CPU matching, protection mechanisms, target object type…
Some languages are compiled into a bytecode (!= machine code).
Intermediate language that is processed by a VM or framework.
.NET, Java, Python, JS, LISP, LUA, Ocaml, Tcl, FoxPro, WebAssembly.
Bytecode contains a compact (optimized) representation of the higher layer structures.
Framework/VM will execute bytecode in the target CPU.
The same bytecode usually can be executed in multiple CPUs, provided there is a native VM implementation.
The Java moto: Write Once, Run Anywhere.
Bytecode allows easier extraction of information, provided there is such a route.
May recover classes, function names, and even comments (but not always).
Traditional decompiling tools will not process bytecode (that easily).