Binary Objects

Binary files

The result of a compilation process.

  • Translating high-level code (C/C++, etc…) into native code or bytecode.

Code is encapsulated in a binary format.

  • It’s not a raw file with unstructured bytes.

The target system (CPU or VM) will process the resulting code.

  • Which may be only part of the file content.

Compilation process

The C/C++ use case

Pre-processor (maybe the compiler) processes code, validating its structure and expanding existing macros.

The result is a text blob with content ready to be further processed, and frequently without external dependencies.

Source code

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
    printf("Hello World\n");
    return 0;
}

Pre-compile: gcc -E -o hello.e hello.c

Produces >1500 lines.


extern int rpmatch (const char *__response) __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__nonnull__ (1))) ;
# 980 "/usr/include/stdlib.h" 3 4
extern int getsubopt (char **__restrict __optionp,
                char *const *__restrict __tokens,
                char **__restrict __valuep)
        __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__nonnull__ (1, 2, 3))) ;
# 1026 "/usr/include/stdlib.h" 3 4
extern int getloadavg (double __loadavg[], int __nelem)
        __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__nonnull__ (1)));
# 1036 "/usr/include/stdlib.h" 3 4
# 1 "/usr/include/x86_64-linux-gnu/bits/stdlib-float.h" 1 3 4
# 1037 "/usr/include/stdlib.h" 2 3 4
# 1048 "/usr/include/stdlib.h" 3 4

# 3 "hello.c" 2

# 5 "hello.c“
int main(int argc, char** argv) {
    printf("Hello World\n");
    return 0;
}

The compiler processes the file and produces assembly code. This may result in assembly for an intermediate processor, and not the final processor.

The processor will create abstract syntax trees (AST) and may tweak or optimize the result according to the options it was provided with.

Typically for GCC, -m and –f switches, and then -On switches can modify the output. That is: the same source code can result in different assemblies based on the compiler, target and flags.

Compile: gcc -masm intel -S –o hello.s hello.c

Assembler

Input containing assembly code is transformed into machine code. Output is a set of object files or modules with a .o extension.

Code produced may use relative addresses, making it reusable (technically relocatable) when integrated into a final binary file.

Symbols are also present as they are required at later stages.

Although the binary files contain machine code, it is not executable as they don’t include all the code required, only what was present in the original .c and included .h.

Linker

Take all the object files belonging to a program and merge them into a single coherent executable, typically intended to be loaded at a particular memory address.

As the arrangement of all modules in the executable is known, the linker can also resolve most symbolic references.

References to libraries may or may not be completely resolved, depending on the type of library. In this case, the library is added as a dependency and the symbol is resolved in real time.

Last updated