How to profile your applications using the Linux perf tools

When an application encounters some performance issues, we have to find the code that causes the problem to optimize only what really matters.

To find the code we have to optimize, the profilers are really useful. In this post, we'll use the Linux perf tools to profile a simple C++ application.

The perf tools are integrated in the Linux kernel since the 2.6 version. The perf tools are based on the perf events subsystem. The perf profiler uses hardware counters to profile the application. The result of this profiler are really precise and because it is not doing instrumentation of the code, it is really fast.

Installation

First of all, if it is not already done, you have to install the perf tools on your computer. On Ubuntu, you can just use apt-get to install it :

sudo apt-get install linux-tools

On the other systems, just use your favorite package manager to install the perf tools.

Usage

To profile an application, you have to record information about an executation, for that, you just have to use perf record :

perf record program [program_options]

For example :

perf record eddic assembly.eddi

Once the execution is over, perf will gives you some information about the record, like:

[ perf record: Woken up 44 times to write data ]

[ perf record: Captured and wrote 11.483 MB perf.data (~501721 samples) ]

If you see that the size of the perf.data is really small, generally for small execution time, you can configure the event period in order to have more information with the --count=period option using a small period. I usually uses 1000, but it can be useful to use a smaller number when the application has a short execution time.

Then, to see the list of the most costly functions, you just have to use perf report :

perf report

It will display a list of the most costly functions ordered by cost. Here is an example taken from one of my C++ applications (the length of the function names is reduced to fit the screen) here :

# Events: 374K cycles
#
# Overhead         Command             Shared Object                                                                                                                               
# ........  ..............  ........................  ...............................................................
#
    87.58%        inlining  [vesafb]                  [k] 0xffffffff81100700
    83.27%         readelf  [vesafb]                  [k] 0xffffffff815c3930
    41.40%              sh  [vesafb]                  [k] 0xffffffff815c3930
    37.74%        inlining  libstdc++.so.6.0.14       [.] 0x653e0         
    12.49%         readelf  libc-2.13.so              [.] vfprintf
     5.37%        inlining  inlining                  [.] parseFunction(std::string, std::string, std::map
     5.20%         readelf  libc-2.13.so              [.] _IO_new_file_xsputn
     4.50%         readelf  readelf                   [.] 0x150e          
     4.10%        inlining  libc-2.13.so              [.] _int_malloc
     4.01%        inlining  libc-2.13.so              [.] memcpy
     3.80%         readelf  libc-2.13.so              [.] ___printf_chk
     2.58%        inlining  libc-2.13.so              [.] __malloc
     1.86%        inlining  libc-2.13.so              [.] _IO_fgets
     1.84%        inlining  inlining                  [.] parseExecutable(std::string, std::set
     1.83%         readelf  libc-2.13.so              [.] __strchrnul
     1.83%        inlining  libc-2.13.so              [.] _int_free
     1.77%        inlining  libc-2.13.so              [.] __strlen_sse42
     1.50%        inlining  libc-2.13.so              [.] cfree
     1.48%        inlining  libc-2.13.so              [.] __memchr
     1.23%        inlining  inlining                  [.] parseLibrary(std::string, std::se
     1.19%        inlining  libboost_graph.so.1.46.1  [.] char* std::string::_S_construct(char const*)
     1.17%         readelf  libc-2.13.so              [.] __dcigettext
     1.15%        inlining  libc-2.13.so              [.] _IO_getline_info_internal

For every functions, you have the information about the cost of the function, the command used to launch it and the shared object in which the function is located. You can navigate through the list like in more utility.

This tool can be really useful to see which functions is interesting to optimize in order to increase the overall performance of the application.

For more information about the perf tools, you can read the perf wiki

In a future article, I will talk about another profiler, Callgrind.

Comments

Comments powered by Disqus