# Profiling

{% hint style="info" %}

### Task

Given the source code of a program, possibly written by someone else, perform its optimization!
{% endhint %}

## Where to start?

* Analyze the source code and detect inefficient “C” code.
* Re-write some sections in assembly.
* Use more efficient algorithms.

How to determine which sections to optimize?

* A typical application consists of many functions spread over different source files.
* **Manual inspection** of the entire application code to determine which sections to optimize is in many cases **unpractical**!

{% hint style="info" %}

### Amdahl’s law

The performance gain that may be obtained when optimizing a section of code is limited to the fraction of the total time that is spent on that particular section.
{% endhint %}

<figure><img src="https://3011170443-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxBNeQZTqLnbb3C3JTlRu%2Fuploads%2Fkx6JscEOuao2SQVzSnSQ%2Fa.png?alt=media&#x26;token=cee1362a-0d12-4c17-94ec-32fe9578ce21" alt=""><figcaption></figcaption></figure>

But how to determine the parts of code that consume the more significant share of CPU?

## Profiling

> Collection of statistical data carried out on the execution of an application

Fundamental to determine the relative weight of each function.

Approaches:

* **Call graph profiling**: function invocation is instrumented.
  * Intrusive requires access to the source code, computationally heavy (overhead can reach 20%).
* **Flat profiling**: the application status is sampled at regular time intervals.
  * Accurate as long as the functions execution time is much bigger than the sampling period.

### Example

<figure><img src="https://3011170443-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxBNeQZTqLnbb3C3JTlRu%2Fuploads%2Fexgq9cNaF7mv1Wk5i50K%2Fa.png?alt=media&#x26;token=0c62124d-6151-4ebc-8f8e-02c7bd1d1465" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}

### “80/20 Law”

In a “typical” application about 80% of the time is spent in about 20% of the code.
{% endhint %}

### Tools

#### GNU Gprof

Profiling requires several steps.

* Compilation and “linking” of the application with debug and profiling active.
  * gcc **-pg** -o sample sample.c
* Run the program to generate statistical data (profiling data).
  * ./sample
* Run the gprof program to analyze the data.
  * gprof ./sample \[> text.file]

{% hint style="info" %}

#### “-pg”

Generate extra code to write profile information suitable for the analysis program gprof.
{% endhint %}

#### GNU Gcov

Coverage test, complementary to gprof.

Indicates the number of times each line is executed.

* Must compile and link with “-fprofile-arcs -ftest-coverage” to generate additional information needed by gcov.
  * gcc **-pg -fprofile-arcs -ftest-coverage** -o sample sample.c -lm
* Run the program to generate statistical data and then run gcov.
  * ./sample
  * gcov sample.c
* File “sample.c.cov” contains the execution data.

<figure><img src="https://3011170443-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxBNeQZTqLnbb3C3JTlRu%2Fuploads%2FUiCcfoFAjRO0tBk37cEe%2Fa.png?alt=media&#x26;token=edc0bceb-bd9c-45e7-9947-0ebecc3f2b50" alt=""><figcaption></figcaption></figure>

Analyzing the code, an optimization was identified ...

<figure><img src="https://3011170443-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxBNeQZTqLnbb3C3JTlRu%2Fuploads%2FjUcHhRLiMQq3cvdYKBU0%2Fa.png?alt=media&#x26;token=669e764b-1c40-4fb1-ace2-41c5966642eb" alt=""><figcaption></figcaption></figure>

Results with gprof.

<figure><img src="https://3011170443-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxBNeQZTqLnbb3C3JTlRu%2Fuploads%2FTUtxTzB3i3nJ3Ky0fQ6z%2Fa.png?alt=media&#x26;token=89e14815-cb03-4847-a5a7-916b4b592720" alt=""><figcaption><p>Execution time reduced by a factor of 106!!!!</p></figcaption></figure>

There are many other profiling tools. E.g. “perf”:

* Performance counters for Linux (“perf” or “perf events”): Linux tool that shows performance measurements in the command line interface.
* Can be used for finding bottlenecks, analysing applications’ execution time, wait latency, CPU cycles, etc.
* Events of interest can be selected by the user (“perf list” allows to see the supported events).

<figure><img src="https://3011170443-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxBNeQZTqLnbb3C3JTlRu%2Fuploads%2FWNeGWeiH8Dx7affrrMp4%2Fa.png?alt=media&#x26;token=c7cf723d-8a3d-419b-b260-e6506226baa6" alt=""><figcaption></figcaption></figure>
