Notes - MCS
Reverse Engineering
Notes - MCS
Reverse Engineering
  • Reverse Engineering
  • Introduction to Reverse Engineering
    • What is Reverse Engineering (RE)
    • RE Concepts
    • When do we have RE activities?
    • Why RE is Relevant and Required
    • Limitations of RE
    • Legal Framework
    • What RE Recovers?
    • Software Reversing
    • Low-level languages
  • Files and Filetypes
    • Files
    • File extensions
    • File Signature
    • Content Type Obfuscation
  • Android – Static Analysis
    • Java Language
    • Application Entry Points
    • Application Structure
    • AndroidManifest.xml
    • Exercise 1
    • Exercise 2
    • Exercise 3
    • Exercise 4
    • Native Applications
    • Java Native Interface
    • Android Native Development Kit (NDK)
    • Android binary libraries
    • JNI Dynamic Linking
    • JNI Static Linking
    • Exercise 5 and 6
    • Web and Hybrid applications
  • Android – Dynamic Analysis
    • Dynamic Analysis
    • Logs
    • Network MiTM
    • Certificate Pinning
    • Dynamic Code Instrumentation
    • Dynamic Binary Instrumentation
    • FRIDA
  • Binary Analysis
    • Binary Objects
    • Executable Symbols
    • What is inside an Object File?
    • ELF Files
    • ELF Program Headers
    • Dynamic Linker
      • Example
    • Binary Analysis Process
    • Function detection
    • Calling Conventions
    • Common Logic Structures
    • C++ code
  • Emulation and Instrumentation
    • Dynamic Binary Analysis
    • Considerations
    • Processes
    • Dynamic Binary Instrumentation (DBI)
    • DBI with Qiling
  • Obfuscation Techniques
    • Obfuscation Techniques
    • Content Type Obfuscation
    • Code Obfuscation
  • Serial Communication
    • Comunicação paralelo
    • Comunicação série
    • Sincronização entre transmissor e recetor
    • Sincronização de relógio
    • Transmissão de dados
    • Topologias de comunicação série
    • Elementos de uma ligação série
  • A interface RS-232C
    • RS-232C
    • Estrutura da trama
    • Camada física
    • Taxa de transmissão (baudrate)
    • Receção de dados
    • Identificar parâmetros de comunicaçãoIdentificar parâmetros de comunicação
    • Encontrar a UART
    • Captura de sinais
  • Norma SPI
    • Introdução
    • Descrição geral
    • Operação
    • Simulação do master SPI
    • Arquiteturas de ligação
    • Tipos de transferências
    • Configuração de um master SPI
    • Procedimento para identificação dos sinais
    • Exemplo
  • Norma I2C
    • Introdução
    • Caraterísticas básicas
    • Exemplo de interligação num barramento I2C
    • Terminologia
    • Masters e Slaves
    • Sinalização
    • Endereçamento
    • Transferência de dados
    • Clock stretching
    • Múltiplos masters
    • Arbitragem
    • Endereços reservados
Powered by GitBook
On this page
  • Identifying a file
  • Questions to answer
  • Disassembler Basics ghidra
  • CFGs
  • Disassembly
  • Linear Disassembly
  • Recursive Disassembly
  1. Binary Analysis

Binary Analysis Process

Up to now, we know how ELF files are structured, but the question remains: how do we analyse ELF files?

A possible flow can be:

  • File analysis (file, nm, ldd, content visualization, foremost, binwalk).

  • Static Analysis (disassemblers and decompilers).

  • Behavioural Analysis (strace, LD_PRELOAD).

  • Dynamic Analysis (debuggers).

Identifying a file

Files should be seen as containers (this includes ELF files).

  • May have the expected content type.

    • But it may have unexpected behaviour (e.g. bug or malware).

  • May have unexpected, additional content (e.g. polyglots).

    • More common in DRM schemes and malware to hide binary blobs.

Files should not be trusted.

  • Both the expected and additional content may be malicious.

  • Static analysis is safe (as long as nothing is executed).

  • Dynamic analysis is not safe. Sandboxes and VMs must be used.

Questions to answer

  • What type of file do we have?

    • Are there hidden contents?

  • What is the architecture?

  • Is it 64/32 or ARM7/ARM9/ARM9E/ARM10?

  • Where is the starting address?

  • What does the main function do?

  • What will the program do?

Some basic tools go a long way.

  • file: (try to identify) the type of file.

    • Only applies to a top container. file is not able to look into enclosed binary blobs.

    • Alternatives that complement file are binwalk and foremost.

  • xxd: hexdump the file, allowing for rapidly detecting patterns.

    • less also helps to hold the content in the terminal.

  • strings: prints null-terminated sequence chars.

    • By default, with more than 4 characters (-n setting).

  • ldd: print shared object dependencies.

    • Libraries registered in the ELF are required (typically for dynamically relocated symbols).

  • nm: dumps symbols from .symtab (or .dyntab with –D).

Disassembler Basics ghidra

  • ghidra is an open-source tool developed by the NSA and released to the public doing Disassembly and Static Analysis.

    • The development branch has support for Dynamic Analysis (should be released “soon”).

  • Works on Windows, Linux and macOS (Java-based).

  • Not the most important tool (IDA is), but is gaining huge traction. It’s free and very powerful with a huge number of platforms and a fine decompiler.

CFGs

It is useful to think of machine code in a graph structure, called a control-flow graph.

A node in a CFG is a group of adjacent instructions called a basic block:

  • The only jumps into a basic block are to the first instruction.

  • The only jumps out of a basic block are from the last instruction.

  • I.e., a basic block always executes as a unit.

Edges between blocks represent possible jumps.

Basic block a dominates basic block b if every path to b passes through a first. Strictly dominates if a != b.

Basic block b post-dominates a if every path through a also passes through b later.

Disassembly

The disassembly process involves analyzing the binary and converting binary code to assembly.

  • But “binary” is just a sequence of bytes, that must be mapped in the scope of a given architecture.

  • Conversion depends on many factors, including compiler and flags.

The process is not perfect and may induce RE Analysts in error.

  • Present instructions that do not exist.

  • Ignore instructions that are in the binary code.

Main approaches:

  • Linear Disassembly.

  • Recursive Disassembly.

Linear Disassembly

The simplest approach towards analyzing a program: Iterate over all code segments, disassembling the binary code as opcodes are found.

Start at some address and follow the binary.

  • Entry point or other point in the binary file.

  • The entry point may not be known.

Works best with:

  • binary blobs such as from firmware (start at the beginning).

  • objects which do not have data at the beginning.

  • architecture uses variable length instructions (x86).

It is vital to define the initial address for decompiling.

An offset error will result in invalid or wrong instructions being decoded.

Linear disassembly will also try to disassemble data from the binary as if it were actual code.

Linear Disassembly is oblivious to the actual Program Flow.

With x86, because each opcode has a variable length, the code tends to auto-synchronize, but the first instructions will be missed.

Issues

With ELF files in x86, linear disassembly tends to be useful.

  • Compilers do not emit inline data and the process rapidly synchronizes.

  • Still, padding and alignment efforts may create some wrong instructions.

With PE files, compilers may emit inline data and Linear Disassembly is not adequate.

  • Every time data is found, disassembly becomes desynchronized.

Other architectures (ARM) and binary objects usually are not suited for Linear Disassembly.

  • Obfuscation may include code as data, which is loaded dynamically.

  • Fixed-length instruction sets will not easily synchronize.

So why is it useful?

Code in the binary blob may be executed with a dynamic call.

  • Some JMP/CALL with an address computed dynamically and unknown to the static analyzer.

Linear Disassembly will decompile everything:

  • whether or not it is called - May be useful to uncover hidden program code.

  • even if the binary blob is not a structured executable – Boot sector, firmware.

Readily available with simple tools: objdump and gdb.

  • Gdb memory dump (x/i) will also use Linear Disassembly.

Recursive Disassembly

A more complex approach that disassembles code from an initial point, while following the control flow.

  • That is: follows jmp, call and ret.

As long as the start point is correct, or it synchronizes rapidly, flow can be fully recovered.

  • This is the standard process for more complex tools such as ghidra and IDA.

Goes around inline data as no instruction will exist that will make the program execute at such an address.

  • Well… control flow can easily be forged with ((void ()(int, char)) ptr)().

Last updated 1 year ago