> For the complete documentation index, see [llms.txt](https://davidjosearaujo.gitbook.io/notes-mcs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://davidjosearaujo.gitbook.io/notes-mcs/reverse-engineering/emulation-and-instrumentation/considerations.md).

# Considerations

## (Need for) Stability

Reversing is significantly more **difficult if execution is unstable**.

* Observations are affected by "random" factors, such as multithreaded execution, hardware behaviour, user interactions with graphical interfaces, etc.
* Applications being reversed should be isolated from external effects as much as possible.

Determinism in design results from stable execution of a program run.

* Thus it facilitates debugging and reversing.
* A state may also be deterministically altered for the entire program or a specific function (fuzzing).

Logs can be obtained from executions using monitor applications.

## (Need for) Save and Replaying

Reversing may need **tracing** from the current state to the code where a change was produced.

* It implies moving "back in time".
* To restore the past program state, one must **re-run it** and try to find the failure source.
* This operation may be performed multiple times, **moving backwards step-by-step, and then forward**.

Deterministic replay reconstructs program execution using previously recorded input data.

* The first program run is used to record these inputs into the blog.
* Then all following runs will reconstruct the same behaviour because the program uses only recorded inputs.
* It should include all inputs (disk, network).

## (Need for) Safety

Target binary **may be malicious (... it is always malicious until proven safe)**.

An important aspect of Reversing binaries is malware analysis.

* Malware is way too complex to be analyzed statically.
* However, executing the malware may be dangerous.
  * Most importantly: **dangerous in ways unknown to the reverse engineer.**

Solutions must create adequate isolation boundaries between environments.

* If stability is required, no interactions with the software under analysis.
* Sometimes, isolation must be broken to trigger specific behaviour.
  * Network connection allows contact with a C\&C address or to download some payload.
  * Disk or file presence.
  * Whenever possible, such resources should be virtualized.

## (Need for) Support of Heterogeneous Architectures

Dynamic analysis requires the execution of the program under analysis.

An analyst will mostly run on an Intel x86 64-bit computer (a COTS laptop/server).

* Most embedded devices are ARM, which has several variants.
* Microcontrollers frequently use 8085, AVR or PIC architectures (MIPS).
* Several speciality SoCs use custom architectures (the list is large... ).
* Several binary formats are popular: ELF, PE, DWARF and then many others from IoT.

Frameworks must be extensible to support a wide range of architectures.

* And the related interfaces and customizations.
* While minimizing the need for new tools.

## (Need for) Support of Peripheral and external entities

Reversing an application with external interactions may require the existence of the related entities.

* Web sites, and servers in fixed/dynamic IP addresses.
* Common physical devices for user input, storage, ...
* Exotic external devices communicate through known or unknown buses.
* Hardware Dongles.

Need to recreate the set of devices/entities required to trigger a specific path.

* Frequently resorts to device emulation with mock software constructs.

## (Need for) Content manipulation (instrumentation)

The main limitation of a dynamic approach is **coverage**.

* Every path that is not covered by the instrumented executions cannot be analyzed.
* This limitation can be slightly reduced by performing active instrumentation, and in particular by **forcing conditional branching**.

<figure><img src="/files/v4SoMnUVdJ2OyrYvpf8L" alt=""><figcaption></figcaption></figure>

## (Need for) Context manipulation (instrumentation)

A reversing task will need to observe **structure** and **behaviour**.

* The analysis should have **enough coverage to recover an adequate level of detail**.
* But while static analysis aims for wide coverage, dynamic analysis aims for focus.
* What if a specific course of execution is not triggered?
* **Results of dynamic analysis are dependent on the context of the execution.**

Context manipulation allows setting the adequate state to trigger a specific flow of execution, **increasing the reversing coverage**.

* Achieved by careful manipulation of execution state, registers and memory content.
* Problems:
  * This may lead to the recovery of an incorrect design as the found flow may be a decoy!
  * May lead to the recovery of artificial vulnerabilities, that do not exist.

## Context manipulation (instrumentation)

* **Live patching:** modifying RAM in a debugger/controlled environment.
* **File patching:** alter binary files to replace their content.
* **Binary instrumentation:** real-time, automated modification.

## Design Fidelity

Program under analysis may **detect it and try to defend actively against analysis**.

* For instance, it can hide a part of its behaviour if it detects that it is being analyzed.
* These anti-debugging and anti-instrumentation techniques are used by many malware.

So, when we achieve a hypothesis of a design, how correct it is?

<figure><img src="/files/xbAPAqojHJrTesMN1Cyq" alt=""><figcaption></figcaption></figure>

### Example of gdb+br detection

<figure><img src="/files/LIU1D2wNVQftFzTsKkOp" alt=""><figcaption></figcaption></figure>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://davidjosearaujo.gitbook.io/notes-mcs/reverse-engineering/emulation-and-instrumentation/considerations.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
