<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Blog blog("Baptiste Wicht"); (Posts about Performances)</title><link>https://baptiste-wicht.com/</link><description></description><atom:link href="https://baptiste-wicht.com/categories/performances.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><lastBuildDate>Sun, 15 Feb 2026 06:57:40 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Deep Learning Library 1.0 - Fast Neural Network Library</title><link>https://baptiste-wicht.com/posts/2017/10/deep-learning-library-10-fast-neural-network-library.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;div&gt;&lt;img alt="DLL Logo" class="align-center" src="https://baptiste-wicht.com/images/dll_logo.png"&gt;
&lt;p&gt;I'm very happy to announce the release of the first version of Deep Learning
Library (DLL) 1.0. DLL is a neural network library with a focus on speed and
ease of use.&lt;/p&gt;
&lt;p&gt;I started working on this library about 4 years ago for my Ph.D. thesis.
I needed a good library to train and use Restricted Boltzmann Machines (RBMs)
and at this time there was no good support for it. Therefore, I decided to write
my own. It now has very complete support for the RBM and the Convolutional RBM
(CRBM) models. Stacks of RBMs (or Deep Belief Networks (DBNs)) can be pretrained
using Contrastive Divergence and then either fine-tuned with mini-batch gradient
descent or Conjugate Gradient or used as a feature extractor. Over the years,
the library has been extended to handle Artificial Neural Networks (ANNs) and
Convolutional Neural Networks (CNNs). The network is also able to train regular
auto-encoders. Several advanced layers such as Dropout or Batch Normalization
are also available as well as adaptive learning rates techniques such as
Adadelta and Adam. The library also has integrated support for a few datasets:
MNIST, CIFAR-10 and ImageNet.&lt;/p&gt;
&lt;p&gt;This library can be used using a C++ interface. The library is fully
header-only. It requires a C++14 compiler, which means a minimum of clang 3.9 or
GCC 6.3.&lt;/p&gt;
&lt;p&gt;In this post, I'm going to present a few examples on using the library and give
some information about the performance of the library and the roadmap for the
project.&lt;/p&gt;
&lt;p class="more"&gt;&lt;a href="https://baptiste-wicht.com/posts/2017/10/deep-learning-library-10-fast-neural-network-library.html"&gt;Read more…&lt;/a&gt;&lt;/p&gt;&lt;/div&gt;</description><category>C++</category><category>dll</category><category>etl</category><category>GPU</category><category>Machine Learning</category><category>Performances</category><category>Releases</category><guid>https://baptiste-wicht.com/posts/2017/10/deep-learning-library-10-fast-neural-network-library.html</guid><pubDate>Sat, 07 Oct 2017 13:42:16 GMT</pubDate></item><item><title>C++11 Performance tip: Update on when to use std::pow ?</title><link>https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-update-when-to-use-std-pow.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;A few days ago, I published a post comparing the
&lt;a class="reference external" href="https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-when-to-use-std-pow.html"&gt;performance of std::pow against direct multiplications&lt;/a&gt;. When not compiling with -ffast-math, direct multiplication was significantly faster than &lt;code&gt;std::pow&lt;/code&gt;, around two orders of magnitude faster when comparing &lt;code&gt;x * x * x&lt;/code&gt; and &lt;code&gt;code:std::pow(x, 3)&lt;/code&gt;.
One comment that I've got was to test for which &lt;code&gt;n&lt;/code&gt; is
&lt;code&gt;code:std::pow(x, n)&lt;/code&gt; becoming faster than multiplying in a loop. Since
std::pow is using a special algorithm to perform the computation rather than be
simply loop-based multiplications, there may be a point after which it's more interesting to use the
algorithm rather than a loop. So I decided to do the tests. You can also find
the result in the original article, which I've updated.&lt;/p&gt;
&lt;p&gt;First, our pow function:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code c++"&gt;&lt;a id="rest_code_7108cb7dee0c48eba724db26a71df046-1" name="rest_code_7108cb7dee0c48eba724db26a71df046-1" href="https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-update-when-to-use-std-pow.html#rest_code_7108cb7dee0c48eba724db26a71df046-1"&gt;&lt;/a&gt;&lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;my_pow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;a id="rest_code_7108cb7dee0c48eba724db26a71df046-2" name="rest_code_7108cb7dee0c48eba724db26a71df046-2" href="https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-update-when-to-use-std-pow.html#rest_code_7108cb7dee0c48eba724db26a71df046-2"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_7108cb7dee0c48eba724db26a71df046-3" name="rest_code_7108cb7dee0c48eba724db26a71df046-3" href="https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-update-when-to-use-std-pow.html#rest_code_7108cb7dee0c48eba724db26a71df046-3"&gt;&lt;/a&gt;
&lt;a id="rest_code_7108cb7dee0c48eba724db26a71df046-4" name="rest_code_7108cb7dee0c48eba724db26a71df046-4" href="https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-update-when-to-use-std-pow.html#rest_code_7108cb7dee0c48eba724db26a71df046-4"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;a id="rest_code_7108cb7dee0c48eba724db26a71df046-5" name="rest_code_7108cb7dee0c48eba724db26a71df046-5" href="https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-update-when-to-use-std-pow.html#rest_code_7108cb7dee0c48eba724db26a71df046-5"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_7108cb7dee0c48eba724db26a71df046-6" name="rest_code_7108cb7dee0c48eba724db26a71df046-6" href="https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-update-when-to-use-std-pow.html#rest_code_7108cb7dee0c48eba724db26a71df046-6"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_7108cb7dee0c48eba724db26a71df046-7" name="rest_code_7108cb7dee0c48eba724db26a71df046-7" href="https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-update-when-to-use-std-pow.html#rest_code_7108cb7dee0c48eba724db26a71df046-7"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_7108cb7dee0c48eba724db26a71df046-8" name="rest_code_7108cb7dee0c48eba724db26a71df046-8" href="https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-update-when-to-use-std-pow.html#rest_code_7108cb7dee0c48eba724db26a71df046-8"&gt;&lt;/a&gt;
&lt;a id="rest_code_7108cb7dee0c48eba724db26a71df046-9" name="rest_code_7108cb7dee0c48eba724db26a71df046-9" href="https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-update-when-to-use-std-pow.html#rest_code_7108cb7dee0c48eba724db26a71df046-9"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_7108cb7dee0c48eba724db26a71df046-10" name="rest_code_7108cb7dee0c48eba724db26a71df046-10" href="https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-update-when-to-use-std-pow.html#rest_code_7108cb7dee0c48eba724db26a71df046-10"&gt;&lt;/a&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And now, let's see the performance. I've compiled my benchmark with GCC 4.9.3
and running on my old Sandy Bridge processor. Here are the results for 1000
calls to each functions:&lt;/p&gt;
&lt;div id="graph_std_pow_my_pow_1" style="width: 700px; height: 400px;"&gt;&lt;/div&gt;&lt;p&gt;We can see that between &lt;code&gt;n=100&lt;/code&gt; and &lt;code&gt;n=110&lt;/code&gt;, &lt;code&gt;std::pow(x, n)&lt;/code&gt;
starts to be faster than &lt;code&gt;my_pow(x, n)&lt;/code&gt;. At this point, you should only
use &lt;code&gt;std::pow(x, n)&lt;/code&gt;.  Interestingly too, the time for &lt;code&gt;std::pow(x,
n)&lt;/code&gt; is decreasing. Let's see how is the performance with higher range of
&lt;code&gt;n&lt;/code&gt;:&lt;/p&gt;
&lt;div id="graph_std_pow_my_pow_2" style="width: 700px; height: 400px;"&gt;&lt;/div&gt;&lt;p&gt;We can see that the pow function time still remains stable while our loop-based
pow function still increases linearly. At &lt;code&gt;n=1000&lt;/code&gt;, &lt;code&gt;std::pow&lt;/code&gt; is
one order of magnitude faster than &lt;code&gt;my_pow&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Overall, if you do not care much about extreme accuracy, you may consider using
you own pow function for small-ish (integer) &lt;code&gt;n&lt;/code&gt; values. After
&lt;code&gt;n=100&lt;/code&gt;, it becomes more interesting to use &lt;code&gt;std::pow&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If you want more results on the subject, you take a look at the
&lt;a class="reference external" href="https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-when-to-use-std-pow.html"&gt;original article&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you are interested in the code of this benchmark, it's available online:
&lt;a class="reference external" href="https://github.com/wichtounet/articles/blob/master/src/bench_pow_my_pow.cpp"&gt;bench_pow_my_pow.cpp&lt;/a&gt;&lt;/p&gt;
&lt;script type="text/javascript" src="https://www.google.com/jsapi"&gt;&lt;/script&gt;
&lt;script type="text/javascript"&gt;google.load('visualization', '1.0', {'packages':['corechart']});&lt;/script&gt;
&lt;script type="text/javascript"&gt;
function draw_graph_pow_my_pow_1(){
var data = google.visualization.arrayToDataTable([
['n', 'my_pow(x, n)', 'std::pow(x, n)'],
['10',   2,     127],
['20',   17,     123],
['30',   26,     127],
['40',   36,     123],
['50',   43,     123],
['60',   55,     123],
['70',   72,     123],
['80',   85,     123],
['90',   102,    126],
['100',  114,    125],
['110',  131,    115],
['120',  144,    111],
['130',  165,    111],
['140',  173,    108],
['150',  189,    107],
['160',  202,    112],
['170',  219,    106],
['180',  232,    105],
['190',  249,    108],
['200',  261,    105],
]);
var graph = new google.visualization.LineChart(document.getElementById('graph_std_pow_my_pow_1'));
var options = {curveType: "function",title: "std::pow(x, 2) (float)",animation: {duration:1200, easing:"in"},width: 700, height: 400,hAxis: {title:"Number of elements", slantedText:true},vAxis: {viewWindow: {min:0}, title:"us"}};
graph.draw(data, options);
}
function draw_graph_pow_my_pow_2(){
var data = google.visualization.arrayToDataTable([
['n', 'my_pow(x, n)', 'std::pow(x, n)'],
['100',  114,    125],
['200',  261,    105],
['300',  410,    104],
['400',  558,    104],
['500',  708,    104],
['600',  855,    104],
['700',  1002,   104],
['800',  1148,   104],
['900',  1300,   104],
['1000', 1442,   104],
]);
var graph = new google.visualization.LineChart(document.getElementById('graph_std_pow_my_pow_2'));
var options = {curveType: "function",title: "std::pow(x, 2) (float)",animation: {duration:1200, easing:"in"},width: 700, height: 400,hAxis: {title:"Number of elements", slantedText:true},vAxis: {viewWindow: {min:0}, title:"us"}};
graph.draw(data, options);
}
function draw_all(){
draw_graph_pow_my_pow_1();
draw_graph_pow_my_pow_2();
}
google.setOnLoadCallback(draw_all);
&lt;/script&gt;</description><category>Benchmark</category><category>C++</category><category>C++11</category><category>Performances</category><category>Tip</category><guid>https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-update-when-to-use-std-pow.html</guid><pubDate>Fri, 22 Sep 2017 09:21:07 GMT</pubDate></item><item><title>C++11 Performance tip: When to use std::pow ?</title><link>https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-when-to-use-std-pow.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;div&gt;&lt;p&gt;Update: I've added a new section for larger values of &lt;code&gt;n&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Recently, I've been wondering about the performance of &lt;code&gt;std::pow(x, n)&lt;/code&gt;.
I'm talking here about the case when &lt;code&gt;n&lt;/code&gt; is an integer. In the case when
&lt;code&gt;n&lt;/code&gt; is not an integer, I believe, you should always use &lt;code&gt;std::pow&lt;/code&gt;
or use another specialized library.&lt;/p&gt;
&lt;p&gt;In case when n is an integer, you can actually replace it with the direct
equivalent (for instance &lt;code&gt;std::pow(x, 3) = x * x x&lt;/code&gt;). If n is very large,
you'd rather write a loop of course ;) In practice, we generally use powers of
two and three much more often than power of 29, although that could happen. Of
course, it especially make sense to wonder about this if the pow is used inside
a loop. If you only use it once outside a loop, that won't be any difference on
the overall performance.&lt;/p&gt;
&lt;p&gt;Since I'm mostly interested in single precision performance (neural networks are
only about single precision), the first benchmarks will be using &lt;code&gt;float&lt;/code&gt;.&lt;/p&gt;
&lt;p class="more"&gt;&lt;a href="https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-when-to-use-std-pow.html"&gt;Read more…&lt;/a&gt;&lt;/p&gt;&lt;/div&gt;</description><category>Benchmark</category><category>C++</category><category>C++11</category><category>Performances</category><category>Tip</category><guid>https://baptiste-wicht.com/posts/2017/09/cpp11-performance-tip-when-to-use-std-pow.html</guid><pubDate>Mon, 18 Sep 2017 05:50:44 GMT</pubDate></item><item><title>C++11 Concurrency Tutorial - Part 5: Futures</title><link>https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;I've been recently reminded that a long time ago I was doing a series of
tutorial on C++11 Concurrency. For some reason, I haven't continued these
tutorials.  The next post in the series was supposed to be about Futures, so I'm
finally going to do it :)&lt;/p&gt;
&lt;p&gt;Here are the links to the current posts of the C++11 Concurrency Tutorial:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://baptiste-wicht.com/posts/2012/03/cpp11-concurrency-part1-start-threads.html"&gt;Part 1: Start Threads&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://baptiste-wicht.com/posts/2012/03/cp11-concurrency-tutorial-part-2-protect-shared-data.html"&gt;Part 2: Protect Shared Data&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://baptiste-wicht.com/posts/2012/04/c11-concurrency-tutorial-advanced-locking-and-condition-variables.html"&gt;Part 3: Advanced Locking and condition variables&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://baptiste-wicht.com/posts/2012/07/c11-concurrency-tutorial-part-4-atomic-type.html"&gt;Part 4: Atomic Types&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this post, we are going to talk about futures, more precisely
&lt;code&gt;std::future&amp;lt;T&amp;gt;&lt;/code&gt;. What is a future ? It's a very nice and simple mechanism
to work with asynchronous tasks. It also has the advantage of decoupling you
from the threads themselves, you can do multithreading without using
&lt;code&gt;std::thread&lt;/code&gt;. The future itself is a structure pointing to a result that
will be computed in the future. How to create a future ? The simplest way is to
use &lt;code&gt;std::async&lt;/code&gt; that will create an asynchronous task and return
a &lt;code&gt;std::future&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Let's start with the simplest of the examples:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code c++"&gt;&lt;a id="rest_code_bf977538614c49e7a3e782e3929b26de-1" name="rest_code_bf977538614c49e7a3e782e3929b26de-1" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_bf977538614c49e7a3e782e3929b26de-1"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;thread&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_bf977538614c49e7a3e782e3929b26de-2" name="rest_code_bf977538614c49e7a3e782e3929b26de-2" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_bf977538614c49e7a3e782e3929b26de-2"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;future&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_bf977538614c49e7a3e782e3929b26de-3" name="rest_code_bf977538614c49e7a3e782e3929b26de-3" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_bf977538614c49e7a3e782e3929b26de-3"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;iostream&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_bf977538614c49e7a3e782e3929b26de-4" name="rest_code_bf977538614c49e7a3e782e3929b26de-4" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_bf977538614c49e7a3e782e3929b26de-4"&gt;&lt;/a&gt;
&lt;a id="rest_code_bf977538614c49e7a3e782e3929b26de-5" name="rest_code_bf977538614c49e7a3e782e3929b26de-5" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_bf977538614c49e7a3e782e3929b26de-5"&gt;&lt;/a&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;a id="rest_code_bf977538614c49e7a3e782e3929b26de-6" name="rest_code_bf977538614c49e7a3e782e3929b26de-6" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_bf977538614c49e7a3e782e3929b26de-6"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;launch&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;async&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[](){&lt;/span&gt;
&lt;a id="rest_code_bf977538614c49e7a3e782e3929b26de-7" name="rest_code_bf977538614c49e7a3e782e3929b26de-7" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_bf977538614c49e7a3e782e3929b26de-7"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"I'm a thread"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;endl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_bf977538614c49e7a3e782e3929b26de-8" name="rest_code_bf977538614c49e7a3e782e3929b26de-8" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_bf977538614c49e7a3e782e3929b26de-8"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;a id="rest_code_bf977538614c49e7a3e782e3929b26de-9" name="rest_code_bf977538614c49e7a3e782e3929b26de-9" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_bf977538614c49e7a3e782e3929b26de-9"&gt;&lt;/a&gt;
&lt;a id="rest_code_bf977538614c49e7a3e782e3929b26de-10" name="rest_code_bf977538614c49e7a3e782e3929b26de-10" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_bf977538614c49e7a3e782e3929b26de-10"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;a id="rest_code_bf977538614c49e7a3e782e3929b26de-11" name="rest_code_bf977538614c49e7a3e782e3929b26de-11" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_bf977538614c49e7a3e782e3929b26de-11"&gt;&lt;/a&gt;
&lt;a id="rest_code_bf977538614c49e7a3e782e3929b26de-12" name="rest_code_bf977538614c49e7a3e782e3929b26de-12" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_bf977538614c49e7a3e782e3929b26de-12"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_bf977538614c49e7a3e782e3929b26de-13" name="rest_code_bf977538614c49e7a3e782e3929b26de-13" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_bf977538614c49e7a3e782e3929b26de-13"&gt;&lt;/a&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Nothing really special here. &lt;code&gt;std::async&lt;/code&gt; will execute the task that we
give it (here a lambda) and return a &lt;code&gt;std::future&lt;/code&gt;. Once you use the
&lt;code&gt;get()&lt;/code&gt; function on a future, it will wait until the result is available
and return this result to you once it is. The &lt;code&gt;get()&lt;/code&gt; function is then
blocking. Since the lambda, is a void lambda, the returned future is of type
&lt;code&gt;std::future&amp;lt;void&amp;gt;&lt;/code&gt; and &lt;code&gt;get()&lt;/code&gt; returns &lt;code&gt;void&lt;/code&gt; as well. It is
very important to know that you cannot call &lt;code&gt;get&lt;/code&gt; several times on the
same future. Once the result is consumed, you cannot consume it again! If you
want to use the result several times, you need to store it yourself after you
called &lt;code&gt;get()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Let's see with something that returns a value and actually takes some time
before returning it:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code c++"&gt;&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-1" name="rest_code_67ba05af628f4ec78f71b947fee29903-1" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-1"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;thread&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-2" name="rest_code_67ba05af628f4ec78f71b947fee29903-2" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-2"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;future&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-3" name="rest_code_67ba05af628f4ec78f71b947fee29903-3" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-3"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;iostream&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-4" name="rest_code_67ba05af628f4ec78f71b947fee29903-4" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-4"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;chrono&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-5" name="rest_code_67ba05af628f4ec78f71b947fee29903-5" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-5"&gt;&lt;/a&gt;
&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-6" name="rest_code_67ba05af628f4ec78f71b947fee29903-6" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-6"&gt;&lt;/a&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-7" name="rest_code_67ba05af628f4ec78f71b947fee29903-7" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-7"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;launch&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;async&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[](){&lt;/span&gt;
&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-8" name="rest_code_67ba05af628f4ec78f71b947fee29903-8" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-8"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;chrono&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-9" name="rest_code_67ba05af628f4ec78f71b947fee29903-9" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-9"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-10" name="rest_code_67ba05af628f4ec78f71b947fee29903-10" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-10"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-11" name="rest_code_67ba05af628f4ec78f71b947fee29903-11" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-11"&gt;&lt;/a&gt;
&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-12" name="rest_code_67ba05af628f4ec78f71b947fee29903-12" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-12"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// Do something else ?&lt;/span&gt;
&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-13" name="rest_code_67ba05af628f4ec78f71b947fee29903-13" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-13"&gt;&lt;/a&gt;
&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-14" name="rest_code_67ba05af628f4ec78f71b947fee29903-14" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-14"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;endl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-15" name="rest_code_67ba05af628f4ec78f71b947fee29903-15" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-15"&gt;&lt;/a&gt;
&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-16" name="rest_code_67ba05af628f4ec78f71b947fee29903-16" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-16"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_67ba05af628f4ec78f71b947fee29903-17" name="rest_code_67ba05af628f4ec78f71b947fee29903-17" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_67ba05af628f4ec78f71b947fee29903-17"&gt;&lt;/a&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This time, the future will be of the time &lt;code&gt;std::future&amp;lt;int&amp;gt;&lt;/code&gt; and thus
&lt;code&gt;get()&lt;/code&gt; will also return an &lt;code&gt;int&lt;/code&gt;. &lt;code&gt;std::async&lt;/code&gt; will again
launch a task in an asynchronous way and &lt;code&gt;future.get()&lt;/code&gt; will wait for the
answer. What is interesting, is that you can do something else before the call
to future.&lt;/p&gt;
&lt;p&gt;But &lt;code&gt;get()&lt;/code&gt; is not the only interesting function in &lt;code&gt;std::future&lt;/code&gt;.
You also have &lt;code&gt;wait()&lt;/code&gt; which is almost the same as &lt;code&gt;get()&lt;/code&gt; but does
not consume the result. For instance, you can wait for several futures and then
consume their result together. But, more interesting are the
&lt;code&gt;wait_for(duration)&lt;/code&gt; and &lt;code&gt;wait_until(timepoint)&lt;/code&gt; functions. The
first one wait for the result at most the given time and then returns and the
second one wait for the result at most until the given time point. I think that
&lt;code&gt;wait_for&lt;/code&gt; is more useful in practices, so let's discuss it further.
Finally, an interesting function is &lt;code&gt;bool valid()&lt;/code&gt;. When you use
&lt;code&gt;get()&lt;/code&gt; on the future, it will consume the result, making &lt;code&gt;valid()
returns :code:`false&lt;/code&gt;. So, if you intend to check multiple times for a future,
you should use &lt;code&gt;valid()&lt;/code&gt; first.&lt;/p&gt;
&lt;p&gt;One possible scenario would be if you have several asynchronous tasks, which is
a common scenario. You can imagine that you want to process the results as fast
as possible, so you want to ask the futures for their result several times. If
no result is available, maybe you want to do something else. Here is a possible
implementation:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code c++"&gt;&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-1" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-1" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-1"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;thread&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-2" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-2" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-2"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;future&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-3" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-3" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-3"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;iostream&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-4" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-4" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-4"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;chrono&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-5" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-5" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-5"&gt;&lt;/a&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-6" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-6" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-6"&gt;&lt;/a&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-7" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-7" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-7"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;launch&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;async&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[](){&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-8" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-8" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-8"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;chrono&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-9" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-9" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-9"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-10" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-10" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-10"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-11" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-11" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-11"&gt;&lt;/a&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-12" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-12" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-12"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;launch&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;async&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[](){&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-13" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-13" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-13"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;chrono&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-14" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-14" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-14"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-15" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-15" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-15"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-16" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-16" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-16"&gt;&lt;/a&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-17" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-17" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-17"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;launch&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;async&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[](){&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-18" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-18" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-18"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;chrono&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-19" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-19" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-19"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;666&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-20" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-20" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-20"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-21" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-21" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-21"&gt;&lt;/a&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-22" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-22" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-22"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;chrono&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;milliseconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-23" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-23" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-23"&gt;&lt;/a&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-24" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-24" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-24"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;()){&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-25" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-25" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-25"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wait_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;future_status&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ready&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-26" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-26" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-26"&gt;&lt;/a&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"Task1 is done! "&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;endl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-27" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-27" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-27"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-28" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-28" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-28"&gt;&lt;/a&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-29" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-29" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-29"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wait_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;future_status&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ready&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-30" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-30" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-30"&gt;&lt;/a&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"Task2 is done! "&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;endl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-31" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-31" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-31"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-32" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-32" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-32"&gt;&lt;/a&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-33" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-33" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-33"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wait_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;future_status&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ready&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-34" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-34" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-34"&gt;&lt;/a&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"Task3 is done! "&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;endl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-35" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-35" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-35"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-36" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-36" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-36"&gt;&lt;/a&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-37" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-37" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-37"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"I'm doing my own work!"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;endl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-38" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-38" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-38"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;chrono&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-39" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-39" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-39"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"I'm done with my own work!"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;endl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-40" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-40" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-40"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-41" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-41" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-41"&gt;&lt;/a&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-42" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-42" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-42"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"Everything is done, let's go back to the tutorial"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;endl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-43" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-43" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-43"&gt;&lt;/a&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-44" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-44" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-44"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_80753ae9a6904b01b091afc1c44fdebd-45" name="rest_code_80753ae9a6904b01b091afc1c44fdebd-45" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_80753ae9a6904b01b091afc1c44fdebd-45"&gt;&lt;/a&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The three tasks are started asynchronously with &lt;code&gt;std::async&lt;/code&gt; and the
resulting &lt;code&gt;std::future&lt;/code&gt; are stored. Then, as long as one of the tasks is
not complete, we query each three task and try to process its result. If no
result is available, we simply do something else. This example is important to
understand, it covers pretty much every concept of the futures.&lt;/p&gt;
&lt;p&gt;One interesting thing that remains is that you can pass parameters to your task
via &lt;code&gt;std::async&lt;/code&gt;. Indeed, all the extra parameters that you pass to
&lt;code&gt;std::async&lt;/code&gt; will be passed to the task itself. Here is an example of
spawning tasks in a loop with different parameters:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code c++"&gt;&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-1" name="rest_code_5c2261a883a84e31800c404bbef1e700-1" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-1"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;thread&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-2" name="rest_code_5c2261a883a84e31800c404bbef1e700-2" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-2"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;future&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-3" name="rest_code_5c2261a883a84e31800c404bbef1e700-3" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-3"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;iostream&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-4" name="rest_code_5c2261a883a84e31800c404bbef1e700-4" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-4"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;chrono&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-5" name="rest_code_5c2261a883a84e31800c404bbef1e700-5" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-5"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;vector&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-6" name="rest_code_5c2261a883a84e31800c404bbef1e700-6" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-6"&gt;&lt;/a&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-7" name="rest_code_5c2261a883a84e31800c404bbef1e700-7" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-7"&gt;&lt;/a&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-8" name="rest_code_5c2261a883a84e31800c404bbef1e700-8" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-8"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-9" name="rest_code_5c2261a883a84e31800c404bbef1e700-9" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-9"&gt;&lt;/a&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-10" name="rest_code_5c2261a883a84e31800c404bbef1e700-10" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-10"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-11" name="rest_code_5c2261a883a84e31800c404bbef1e700-11" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-11"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emplace_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;launch&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;async&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;param&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-12" name="rest_code_5c2261a883a84e31800c404bbef1e700-12" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-12"&gt;&lt;/a&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;chrono&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;param&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-13" name="rest_code_5c2261a883a84e31800c404bbef1e700-13" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-13"&gt;&lt;/a&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;param&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-14" name="rest_code_5c2261a883a84e31800c404bbef1e700-14" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-14"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-15" name="rest_code_5c2261a883a84e31800c404bbef1e700-15" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-15"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-16" name="rest_code_5c2261a883a84e31800c404bbef1e700-16" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-16"&gt;&lt;/a&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-17" name="rest_code_5c2261a883a84e31800c404bbef1e700-17" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-17"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"Start querying"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;endl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-18" name="rest_code_5c2261a883a84e31800c404bbef1e700-18" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-18"&gt;&lt;/a&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-19" name="rest_code_5c2261a883a84e31800c404bbef1e700-19" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-19"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-20" name="rest_code_5c2261a883a84e31800c404bbef1e700-20" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-20"&gt;&lt;/a&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;endl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-21" name="rest_code_5c2261a883a84e31800c404bbef1e700-21" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-21"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-22" name="rest_code_5c2261a883a84e31800c404bbef1e700-22" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-22"&gt;&lt;/a&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-23" name="rest_code_5c2261a883a84e31800c404bbef1e700-23" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-23"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_5c2261a883a84e31800c404bbef1e700-24" name="rest_code_5c2261a883a84e31800c404bbef1e700-24" href="https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html#rest_code_5c2261a883a84e31800c404bbef1e700-24"&gt;&lt;/a&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Pretty practical :) All The created &lt;code&gt;std::future&amp;lt;size_t&amp;gt;&lt;/code&gt; are stored in
a &lt;code&gt;std::vector&lt;/code&gt; and then are all queried for their result.&lt;/p&gt;
&lt;p&gt;Overall, I think &lt;code&gt;std::future&lt;/code&gt; and &lt;code&gt;std::async&lt;/code&gt; are great tool that
can simplify your asynchronous code a lot. They allow you to make pretty
advanced stuff while keeping the complexity of the code to a minimum.&lt;/p&gt;
&lt;p&gt;I hope this long-due post is going to be interesting to some of you :)
The code for this post is available &lt;a class="reference external" href="https://github.com/wichtounet/articles/tree/master/src/threads/part5"&gt;on Github&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I do not yet know if there will be a next installment in the series. I've
covered pretty much everything that is available in C++11 for concurrency. I may
cover the parallel algorithms of C++17 in a following post. If you have any
suggestion for the next post, don't hesitate to post a comment or contact me
directly by email.&lt;/p&gt;</description><category>C++</category><category>C++11</category><category>C++11 Concurrency Tutorial</category><category>Concurrency</category><category>Performances</category><guid>https://baptiste-wicht.com/posts/2017/09/cpp11-concurrency-tutorial-futures.html</guid><pubDate>Tue, 12 Sep 2017 13:05:08 GMT</pubDate></item><item><title>C++ Containers Benchmark: vector/list/deque and plf::colony</title><link>https://baptiste-wicht.com/posts/2017/05/cpp-containers-benchmark-vector-list-deque-plf-colony.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;div&gt;&lt;p&gt;Already more than three years ago, I've written a &lt;a class="reference external" href="https://baptiste-wicht.com/posts/2012/12/cpp-benchmark-vector-list-deque.html"&gt;benchmark of some of the STL containers&lt;/a&gt;,
namely the vector, the list and the deque. Since this article was very popular,
I decided to improve the benchmarks and collect again all the results. There are
now more benchmarks and some problems have been fixed in the benchmark code.
Moreover, I have also added a new container, the plf::colony. Therefore, there
are four containers tested:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The std::vector: This is a dynamically-resized array of elements. All the
elements are contiguous in memory. If an element is inserted or removed it at
a position other than the end, the following elements will be moved to fill
the gap or to open a gap. Elements can be accessed at random position in
constant time. The array is resized so that it can several more elements, not
resized at each insert operation. This means that insertion at the end of the
container is done in amortized constant time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The std::deque: The deque is a container that offer constant time insertion
both at the front and at the back of the collection. In current c++ libraries,
it is implementation as a collection of dynamically allocated fixed-size
array. Not all elements are contiguous, but depending on the size of the data
type, this still has good data locality. Access to a random element is also
done in constant time, but with more overhead than the vector. For insertions
and removal at random positions, the elements are shifted either to the front
or to the back meaning that it is generally faster than the vector, by twice
in average.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The std::list: This is a doubly-linked list. It supports constant time
insertions at any position of the collection. However, it does not support
constant time random access. The elements are obviously not contiguous, since
they are all allocated in nodes. For small elements, this collection has
a very big memory overhead.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The plf::colony: This container is a non-standard container which is
unordered, it means that the insertion order will not necessarily be
preserved. It provides strong iterators guarantee, pointers to non-erased
element are not invalidated by insertion or erasure. It is especially tailored
for high-insertion/erasure workloads. Moreover, it is also specially optimized
for non-scalar types, namely structs and classes with relatively large data
size (greater than 128 bits on the official documentation). Its implementation
is more complicated than the other containers. It is also implemented as
a list of memory blocks, but they are of increasingly large sizes. When
elements are erased, there position is not removed, but marked as erased so
that it can be reused for fast insertion later on. This container uses the
same conventions as the standard containers and was proposed for inclusion to
the standard library, which is the main reason why it's included in this
benchmark. If you want more information, you can consult the
&lt;a class="reference external" href="http://plflib.org/colony.htm"&gt;official website&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the text and results, the namespaces will be omitted. Note that I have only
included sequence containers in my test. These are the most common containers in
practices and these also the containers I'm the most familiar with. I could have
included multiset in this benchmark, but the interface and purpose being
different, I didn't want the benchmark to be confusing.&lt;/p&gt;
&lt;p&gt;All the examples are compiled with g++-4.9.4 (-std=c++11 -march=native -O2) and
run on a Gentoo Linux machine with an Intel Core i7-4770 at 3.4GHz.&lt;/p&gt;
&lt;p&gt;For each graph, the vertical axis represent the amount of time necessary to
perform the operations, so the lower values are the better. The horizontal axis
is always the number of elements of the collection. For some graph, the
logarithmic scale could be clearer, a button is available after each graph to
change the vertical scale to a logarithmic scale.&lt;/p&gt;
&lt;p&gt;The tests are done with several different data types. The trivial data types are
varying in size, they hold an array of longs and the size of the array varies to
change the size of the data type. The non-trivial data type is composed of
a string (just long enough to avoid SSO (Small String Optimization) (even though
I'm using GCC)). The non-trivial data types comes in a second version with
noexcept move operations.  Not all results are presented for each data types if
there are not significant differences between in order to keep this article
relatively short (it's already probably too long :P).&lt;/p&gt;
&lt;p class="more"&gt;&lt;a href="https://baptiste-wicht.com/posts/2017/05/cpp-containers-benchmark-vector-list-deque-plf-colony.html"&gt;Read more…&lt;/a&gt;&lt;/p&gt;&lt;/div&gt;</description><category>Benchmarks</category><category>C++</category><category>C++11</category><category>Performances</category><guid>https://baptiste-wicht.com/posts/2017/05/cpp-containers-benchmark-vector-list-deque-plf-colony.html</guid><pubDate>Sun, 21 May 2017 10:46:23 GMT</pubDate></item><item><title>Publication: CPU Performance Optimizations for RBM and CRBM</title><link>https://baptiste-wicht.com/posts/2017/02/publication-cpu-performance-optimizations-rbm-crbm.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;Recently, we have published a paper about performance optimizations that may
interest you.&lt;/p&gt;
&lt;p&gt;The paper is &lt;a class="reference external" href="https://www.researchgate.net/publication/307908790_On_CPU_Performance_Optimization_of_Restricted_Boltzmann_Machine_and_Convolutional_RBM"&gt;On CPU Performance Optimizations for Restricted Boltzmann Machine and Convolutional RBM&lt;/a&gt;, published in the Proceedings of the Artificial Neural Networks and Pattern Recognition workshop (ANNPR-2016). I've presented this paper in Germany, at Ulm.&lt;/p&gt;
&lt;p&gt;Although most of the performance research going on is focused on GPU, there are
still of research laboratories that are only equipped with CPU and it remains
important to be as fast as possible on CPU. Moreover, this is something
I really like.&lt;/p&gt;
&lt;p&gt;For this publication, I have tried to make my Restricted Boltzmann Machine (RBM)
and Convolutional RBM (CRBM) implementations in my DLL library as fast as
possible.&lt;/p&gt;
&lt;p&gt;The first part of the article is about Restricted Boltzmann Machine (RBM) which
are a form of dense Artificial Neural Network (ANN). Their training is very
similar to that of the ANN with Gradient Descent. Four different network
configurations are being tested.&lt;/p&gt;
&lt;p&gt;First, mini-batch training is shown to be much faster than online training, even
when online training is performed in parallel. Once mini-batch training is used,
BLAS operations are used in order to get as much performance as possible on the
different operations, mainly the Matrix Matrix Multiplication with the use of
the GEMM operation from the Intel Math Kernel Library (MKL). Moreover, the
parallel version of the MKL is also used to get even more performance. When all
these optimizations are performed, speedups of 11 to 30 are obtained compared to
the online training, depending on the network configurations. This final version
is able  to perform one epoch of Contrastive Divergence in 4 to 15 seconds
depending on the network, for 60000 images.&lt;/p&gt;
&lt;p&gt;The second part of the article is about Convolutional Restricted Boltzmann
Machine (CRBM). This is almost the equivalent of a Convolutional Neural Network
(CNN). Again four different networks are evaluated.&lt;/p&gt;
&lt;p&gt;The main problem with CRBM is that there are no standard implementations of the
convolution operation that is really fast. Therefore, it is not possible to
simply use a BLAS library to make the computation as fast as possible. The first
optimization that was tried is to vectorize the convolutions. With this, the
speedups have been between 1.1 and 1.9 times faster. I'm not really satisfied
with these results since in fact per convolution the speedups are much better.
Moreover, I have since been able to obtain better speedups but the deadline was
too short to include them in this paper. I'll try to talk about these
improvements in more details on this blog. What is more interesting to to
parallellize the different convolutions since they are mostly independent. This
can bring a speedup of the amount of cores available on the machine. Since
convolutions are extremely memory hungry, virtual cores with Hyper Threading
generally does not help. An interesting optimization is to use a Matrix
Multiplication to compute several valid convolutions at once.  This can give an
additional speedup between 1.6 and 2.2 compared to the vectorized version. While
it is possible to use the FFT to reduce the full convolution as well, in our
experiment the images were not big enough for this to be interesting. The final
speedups are about 10 times faster with these optimizations.&lt;/p&gt;
&lt;p&gt;We have obtained pretty good and I'm happy we have been published. However, I'm
not very satisfied with these results since I've been able to get even faster
since this and when compared with other frameworks, DLL is actually quite
competitive. I'll try to publish something new in the future.&lt;/p&gt;
&lt;p&gt;If you want more information, you can have a look at the paper. If you want to
look at the code, you can have a look at my projects:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/wichtounet/etl"&gt;Expression Templates Library (ETL)&lt;/a&gt;: For
the Matrix Multiplication and Convolutions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/wichtounet/dll"&gt;Deep Learning Library (DLL)&lt;/a&gt;: For the RBM
and CRBM implementations&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Don't hesitate to ask any questions if you want more information :)&lt;/p&gt;</description><category>C++</category><category>CPU</category><category>crbm</category><category>dbn</category><category>Deep Learning</category><category>dll</category><category>etl</category><category>Intel</category><category>Performances</category><category>publications</category><category>rbm</category><category>thesis</category><guid>https://baptiste-wicht.com/posts/2017/02/publication-cpu-performance-optimizations-rbm-crbm.html</guid><pubDate>Tue, 07 Feb 2017 16:33:33 GMT</pubDate></item><item><title>Blazing fast unit test compilation with doctest 1.1</title><link>https://baptiste-wicht.com/posts/2016/09/blazing-fast-unit-test-compilation-with-doctest-11.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;You may remember &lt;a class="reference external" href="http://baptiste-wicht.com/posts/2016/06/reduce-compilation-time-by-another-16-with-catch.html"&gt;my quest for faster compilation times&lt;/a&gt;. I had made several changes to the Catch test framework macros in order to save some compilation at the expense of my test code looking a bit less nice:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code cpp"&gt;&lt;a id="rest_code_99207774a09f4a598386048725bcc49e-1" name="rest_code_99207774a09f4a598386048725bcc49e-1" href="https://baptiste-wicht.com/posts/2016/09/blazing-fast-unit-test-compilation-with-doctest-11.html#rest_code_99207774a09f4a598386048725bcc49e-1"&gt;&lt;/a&gt;&lt;span class="n"&gt;REQUIRE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;//Before&lt;/span&gt;
&lt;a id="rest_code_99207774a09f4a598386048725bcc49e-2" name="rest_code_99207774a09f4a598386048725bcc49e-2" href="https://baptiste-wicht.com/posts/2016/09/blazing-fast-unit-test-compilation-with-doctest-11.html#rest_code_99207774a09f4a598386048725bcc49e-2"&gt;&lt;/a&gt;&lt;span class="n"&gt;REQUIRE_EQUALS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;//After&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The first line is a little bit better, but using several optimizations, I was
able to dramatically change the compilation time of the test cases of ETL. In
the end, I don't think that the difference between the two lines justifies the
high overhead in compilation times.&lt;/p&gt;
&lt;section id="doctest"&gt;
&lt;h2&gt;doctest&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/onqtam/doctest"&gt;doctest&lt;/a&gt; is a framework quite similar to
Catch but that claims to be much lighter. I tested doctest 1.0 early on, but at
this point it was actually slower than Catch and especially slower than my
versions of the macro.&lt;/p&gt;
&lt;p&gt;Today, doctest 1.1 was released with promises of being even lighter than before
and providing several new ways of speeding up compilation. If you want the
results directly, you can take a look at the next section.&lt;/p&gt;
&lt;p&gt;First of all, this new version improved the basic macros to make expression
decomposition faster. When you use the standard REQUIRE macro, the expression is
composed by using several template techniques and operator overloading. This is
really slow to compile. By removing the need for this decomposition, the fast
Catch macros are much faster to compile.&lt;/p&gt;
&lt;p&gt;Moreover, doctest 1.1 also introduces CHECK_EQ that does not any expression
decomposition. This is close to what I did in my macros expect that it is
directly integrated into the framework and preserves all its features. It is
also possible to bypass the expression checking code by using FAST_CHECK_EQ
macro. In that case, the exceptions are not captured. Finally, a new
configuration option is introduced (DOCTEST_CONFIG_SUPER_FAST_ASSERTS) that
removes some features related to automatic debugger breaks. Since I don't use
the debugger features and I don't need to capture exception everywhere (it's
sufficient for me that the test fails completely if an exception is thrown), I'm
more than eager to use these new features.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="results"&gt;
&lt;h2&gt;Results&lt;/h2&gt;
&lt;p&gt;For evaluation, I have compiled the complete test suite of ETL, with 1 thread,
using gcc 4.9.3 with various different options, starting from Catch to doctest
1.1 with all compilation time features. Here are the results, in seconds:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th class="head"&gt;&lt;p&gt;Version&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Time&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;VS Catch&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;VS Fast Catch&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;VS doctest 1.0&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;Catch&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;724.22&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;Fast Catch&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;464.52&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;-36%&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;doctest 1.0&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;871.54&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;+20%&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;+87%&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;doctest 1.1&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;614.67&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;-16%&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;+32%&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;-30%&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;REQUIRE_EQ&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;493.97&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;-32%&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;+6%&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;-43%&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;FAST_REQUIRE_EQ&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;439.09&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;-39%&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;-6%&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;-50%&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;SUPER_FAST_ASSERTS&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;411.11&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;-43%&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;-12%&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;-53%&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As you can see, doctest 1.1 is much faster to compile than doctest 1.0! This is
really great news. Moreover, it is already 16% faster than Catch. When all the
features are used, doctest is 12% faster than my stripped down versions of Catch
macros (and 43% faster than Catch standard macros). This is really cool! It
means that I don't have to do any change in the code (no need to strip macros
myself) and I can gain a lot of compilation time compared to the bare Catch
framework.&lt;/p&gt;
&lt;p&gt;I really think the author of doctest did a great job with the new version.
Although this was not of as much interest for me, there are also a lot of
other changes in the new version. You can consult the
&lt;a class="reference external" href="https://github.com/onqtam/doctest/blob/master/CHANGELOG.md"&gt;changelog&lt;/a&gt; if you want more information.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Overall, doctest 1.1 is much faster to compile than doctest 1.0. Moreover, it
offers very fast macros for test assertions that are much faster to compile
than Catch versions and even faster than the versions I created myself to reduce
compilation time. I really thing this is a great advance for doctest. When
compiling with all the optimizations, doctest 1.1 saves me 50 seconds in
compilation time compared to the fast version of Catch macro and more than
5 minutes compared to the standard version of Catch macros.&lt;/p&gt;
&lt;p&gt;I'll probably start using doctest on my development machine. For now, I'll keep
Catch as well since I need it to generate the unit test reports in XML format
for Sonarqube. Once this feature appears in doctest, I'll probably drop Catch
from ETL and DLL&lt;/p&gt;
&lt;p&gt;If you need blazing fast compilation times for your unit tests, doctest 1.1 is
probably the way to go.&lt;/p&gt;
&lt;/section&gt;</description><category>C++</category><category>Catch</category><category>Compilers</category><category>doctest</category><category>etl</category><category>gcc</category><category>Performances</category><category>Tests</category><category>time</category><guid>https://baptiste-wicht.com/posts/2016/09/blazing-fast-unit-test-compilation-with-doctest-11.html</guid><pubDate>Wed, 21 Sep 2016 19:45:13 GMT</pubDate></item><item><title>Improve DLL and ETL Compile Time further</title><link>https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;For a while, the compilation time of my matrix/vector computation library (ETL), based on Expression Templates has become more and more problematic. I've already worked on this problem &lt;a class="reference external" href="http://baptiste-wicht.com/posts/2015/06/how-i-improved-a-bit-compile-time-of-etl.html"&gt;here&lt;/a&gt; and &lt;a class="reference external" href="http://baptiste-wicht.com/posts/2015/06/improve-etl-compile-time-with-precompiled-headers.html"&gt;there&lt;/a&gt;, using some general techniques (pragmas, precompiled headers, header removals and so on). On this post, I'll talk about two major improvements I have been able to do directly in the code.&lt;/p&gt;
&lt;section id="use-of-static-if"&gt;
&lt;h2&gt;Use of static_if&lt;/h2&gt;
&lt;p&gt;Remember &lt;a class="reference external" href="http://baptiste-wicht.com/posts/2015/07/simulate-static_if-with-c11c14.html"&gt;static_if&lt;/a&gt; ? I was able to use it to really reduce the compile time of DLL.&lt;/p&gt;
&lt;p&gt;I wrote a script to time each test case of the DLL project to find the test cases that took the longest to compile. Once I found the best candidate, I isolated the functions that took the longest to compile. It was quite tedious and I did it by hand, primarily by commenting parts of the code and going deeper and deeper in the code. I was quite suprised to find that a single function call (template function of course ;) ) was responsible for 60% of the compilation time of my candidate test case. The function was instantiating a whole bunch of expression templates (to compute the free energy of several models). The function itself was not really optimizable, but what was really interesting is that this function was only used in some very rare cases and that these cases were known at compile-time :) This was a perfect case to use a static_if. And once the call was inside the static_if, the test case was indeed about 60% faster. &lt;strong&gt;This reduced the overall compilation time of DLL by about 30%&lt;/strong&gt;!&lt;/p&gt;
&lt;p&gt;This could also of course also have been achieved by using two functions, one with the call, one empty and selected by SFINAE (Substitution Failure Is Not An Error). I prefer the statif_if version since this really shows the intent and hides SFINAE behind nicer syntax.&lt;/p&gt;
&lt;p&gt;I was also able to use static_if at other places in the DLL code to avoid instantiating some templates, but the improvements were much less dramatic (about 1% of the total compilation time). I was very lucky to find a single function that accounted for so much compile time. After some more tests, I concluded that much of the compilation time of DLL was spent compiling the Expression Templates from my ETL library so I decided to delve into ETL code directly.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="removal-of-std-async"&gt;
&lt;h2&gt;Removal of std::async&lt;/h2&gt;
&lt;p&gt;The second improvement was very surprising. I was working on improving the compilation of ETL and found out that the sum and average reductions of matrices were dramatically slow, about an order of magnitude slower than standard operations on matrices. In parallel (but the two facts are linked), I also found out another weird fact when splitting a file into 10 parts (the file was comprised of 10 test cases). Compiling the 10 parts separarely (and sequentially, not multiple threads) was about 40% faster than compiling the complete file. There was no swapping so it was not a memory issue. This is not expected. Generally, it is faster to compile a big file than to compile its parts separately. The advantage of smaller files is that you can compile them in parallel and that incremental builds are faster (only compile a small part).&lt;/p&gt;
&lt;p&gt;By elimination, I found out that most of the time was spent inside the function that was dispatching in parallel the work for accumulating the sum of a matrix. Here is the function:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code cpp"&gt;&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-1" name="rest_code_f500b63f097d46e8afba7faf4ac56979-1" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-1"&gt;&lt;/a&gt;&lt;span class="k"&gt;template&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;typename&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;typename&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Functor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;typename&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;AccFunctor&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-2" name="rest_code_f500b63f097d46e8afba7faf4ac56979-2" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-2"&gt;&lt;/a&gt;&lt;span class="kr"&gt;inline&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dispatch_1d_acc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Functor&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;functor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;AccFunctor&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;acc_functor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-3" name="rest_code_f500b63f097d46e8afba7faf4ac56979-3" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-3"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-4" name="rest_code_f500b63f097d46e8afba7faf4ac56979-4" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-4"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-5" name="rest_code_f500b63f097d46e8afba7faf4ac56979-5" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-5"&gt;&lt;/a&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-6" name="rest_code_f500b63f097d46e8afba7faf4ac56979-6" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-6"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-7" name="rest_code_f500b63f097d46e8afba7faf4ac56979-7" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-7"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-8" name="rest_code_f500b63f097d46e8afba7faf4ac56979-8" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-8"&gt;&lt;/a&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-9" name="rest_code_f500b63f097d46e8afba7faf4ac56979-9" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-9"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-10" name="rest_code_f500b63f097d46e8afba7faf4ac56979-10" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-10"&gt;&lt;/a&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;launch&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;async&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;functor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-11" name="rest_code_f500b63f097d46e8afba7faf4ac56979-11" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-11"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-12" name="rest_code_f500b63f097d46e8afba7faf4ac56979-12" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-12"&gt;&lt;/a&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-13" name="rest_code_f500b63f097d46e8afba7faf4ac56979-13" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-13"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;acc_functor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;functor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-14" name="rest_code_f500b63f097d46e8afba7faf4ac56979-14" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-14"&gt;&lt;/a&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-15" name="rest_code_f500b63f097d46e8afba7faf4ac56979-15" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-15"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fut&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-16" name="rest_code_f500b63f097d46e8afba7faf4ac56979-16" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-16"&gt;&lt;/a&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;acc_functor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fut&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-17" name="rest_code_f500b63f097d46e8afba7faf4ac56979-17" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-17"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-18" name="rest_code_f500b63f097d46e8afba7faf4ac56979-18" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-18"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-19" name="rest_code_f500b63f097d46e8afba7faf4ac56979-19" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-19"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;acc_functor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;functor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-20" name="rest_code_f500b63f097d46e8afba7faf4ac56979-20" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-20"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_f500b63f097d46e8afba7faf4ac56979-21" name="rest_code_f500b63f097d46e8afba7faf4ac56979-21" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f500b63f097d46e8afba7faf4ac56979-21"&gt;&lt;/a&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There isn't anything really fancy about this function. This takes one functor that will be done in parallel and one function for accumulation.  It dispatches all the work in batch and then accumulates the results. I tried several things to optimize the compilation time of this function, but nothing worked. The line that was consuming all the time was the std::async line. This function was using std::async because the thread pool that I'm generally using does not support returning values from parallel functors. I decided to use a workaround and use my thread pool and I came out with this version:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code cpp"&gt;&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-1" name="rest_code_f25d473df56343699b3055f4ac6883a4-1" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-1"&gt;&lt;/a&gt;&lt;span class="k"&gt;template&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;typename&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;typename&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Functor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;typename&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;AccFunctor&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-2" name="rest_code_f25d473df56343699b3055f4ac6883a4-2" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-2"&gt;&lt;/a&gt;&lt;span class="kr"&gt;inline&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dispatch_1d_acc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Functor&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;functor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;AccFunctor&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;acc_functor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-3" name="rest_code_f25d473df56343699b3055f4ac6883a4-3" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-3"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-4" name="rest_code_f25d473df56343699b3055f4ac6883a4-4" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-4"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-5" name="rest_code_f25d473df56343699b3055f4ac6883a4-5" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-5"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;default_thread_pool&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-6" name="rest_code_f25d473df56343699b3055f4ac6883a4-6" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-6"&gt;&lt;/a&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-7" name="rest_code_f25d473df56343699b3055f4ac6883a4-7" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-7"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-8" name="rest_code_f25d473df56343699b3055f4ac6883a4-8" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-8"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-9" name="rest_code_f25d473df56343699b3055f4ac6883a4-9" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-9"&gt;&lt;/a&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-10" name="rest_code_f25d473df56343699b3055f4ac6883a4-10" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-10"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sub_functor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;functor&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-11" name="rest_code_f25d473df56343699b3055f4ac6883a4-11" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-11"&gt;&lt;/a&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;functor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-12" name="rest_code_f25d473df56343699b3055f4ac6883a4-12" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-12"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-13" name="rest_code_f25d473df56343699b3055f4ac6883a4-13" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-13"&gt;&lt;/a&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-14" name="rest_code_f25d473df56343699b3055f4ac6883a4-14" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-14"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-15" name="rest_code_f25d473df56343699b3055f4ac6883a4-15" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-15"&gt;&lt;/a&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;do_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sub_functor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-16" name="rest_code_f25d473df56343699b3055f4ac6883a4-16" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-16"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-17" name="rest_code_f25d473df56343699b3055f4ac6883a4-17" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-17"&gt;&lt;/a&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-18" name="rest_code_f25d473df56343699b3055f4ac6883a4-18" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-18"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;acc_functor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;functor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-19" name="rest_code_f25d473df56343699b3055f4ac6883a4-19" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-19"&gt;&lt;/a&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-20" name="rest_code_f25d473df56343699b3055f4ac6883a4-20" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-20"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-21" name="rest_code_f25d473df56343699b3055f4ac6883a4-21" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-21"&gt;&lt;/a&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-22" name="rest_code_f25d473df56343699b3055f4ac6883a4-22" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-22"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fut&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-23" name="rest_code_f25d473df56343699b3055f4ac6883a4-23" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-23"&gt;&lt;/a&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;acc_functor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fut&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-24" name="rest_code_f25d473df56343699b3055f4ac6883a4-24" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-24"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-25" name="rest_code_f25d473df56343699b3055f4ac6883a4-25" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-25"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-26" name="rest_code_f25d473df56343699b3055f4ac6883a4-26" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-26"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;acc_functor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;functor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-27" name="rest_code_f25d473df56343699b3055f4ac6883a4-27" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-27"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_f25d473df56343699b3055f4ac6883a4-28" name="rest_code_f25d473df56343699b3055f4ac6883a4-28" href="https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html#rest_code_f25d473df56343699b3055f4ac6883a4-28"&gt;&lt;/a&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I simply preallocate space for all the threads and create a new functor calling the input functor and saving its result inside the vector. It is less nice, but it works well. And it compiles MUCH faster. This &lt;strong&gt;reduced the compilation time&lt;/strong&gt; of my biggest test case &lt;strong&gt;by a factor of 8&lt;/strong&gt; (from 344 seconds to 44 seconds). This is really crazy. It also fixed the problem where splitting the test case was faster than big file (it is now twice faster to compile the big files than compiling all the small files separately). &lt;strong&gt;This reduced the total compilation time of dll by about 400%&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;As of now, I still have no idea why this makes such a big difference. I have looked at the std::async code, but I haven't found a valid reason for this slowdown. If someone has any idea, I'd be very glad to discuss in the comments below.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="improving-the-template-instantiation-tree"&gt;
&lt;h2&gt;Improving the template instantiation tree&lt;/h2&gt;
&lt;p&gt;I recently discovered the templight tool that is a profiler for templates (pretty cool). After some time, I was able to build it and use it on ETL. For now, I haven't been able to reduce compile time a lot, but I have been able to reduce the template instantiation tree a lot seeing that some instantiations were completely useless and I optimized the code to remove them.&lt;/p&gt;
&lt;p&gt;I won't be go into much details here because I plan to write a post on this subject in the coming days.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In conclusion, I would say that it is pretty hard to improve the compile time of complex C++ programs once you have gone through all the standard methods. However, I was very happy to found that &lt;strong&gt;two optimizations in the source code reduced the overall compilation of DLL by almost 500%&lt;/strong&gt;. I will continue working on this, but for now, the compilation time is much more reasonable.&lt;/p&gt;
&lt;p&gt;I hope the two main facts in this article were interesting. If you have similar experience, comments or ideas for further improvements, I'd be glad to discuss them with you in the comments :)&lt;/p&gt;
&lt;/section&gt;</description><category>C++</category><category>Compilers</category><category>dll</category><category>etl</category><category>gcc</category><category>Performances</category><guid>https://baptiste-wicht.com/posts/2016/01/improve-dll-and-etl-compile-time-further.html</guid><pubDate>Fri, 29 Jan 2016 16:02:34 GMT</pubDate></item><item><title>Improve ETL compile-time with Precompiled Headers</title><link>https://baptiste-wicht.com/posts/2015/06/improve-etl-compile-time-with-precompiled-headers.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;Very recently, I started trying to improve the compile-time of the ETL test suite. While not critical, it is always better to have tests that compile as fast as possible. In a &lt;a class="reference external" href="http://baptiste-wicht.com/posts/2015/06/how-i-improved-a-bit-compile-time-of-etl.html"&gt;previous post&lt;/a&gt;, I was able to improve the time a bit by improve the makefile, using pragra once and avoiding &lt;cite&gt;&amp;lt;iostream&amp;gt;&lt;/cite&gt; headers. With these techniques, I reduced the compile-time from 87.5 to 84.1, which is not bad, but not as good as I would have expected.&lt;/p&gt;
&lt;p&gt;In the previous, I had not tried to use Precompiled Headers (PCH) to improve the compile time, so I thought it would be a good time to do it.&lt;/p&gt;
&lt;section id="precompiled-headers"&gt;
&lt;h2&gt;Precompiled Headers&lt;/h2&gt;
&lt;p&gt;Precompiled Headers are an option of the compiler, where one header gets compiled. Normally, you only compile source files into object files, but you can also compile headers, although it is not the same thing. When a compiler compiles a header, it can do a lot of preprocessing (macros, includes, AST, symbols) and then store all the results into a precompiled header file. Once you compile the source files, the compiler will try to use the precompiled header file instead of the real header file. Of course, this can breaks the C++ standard since with that a header can not have different behaviour based on macros for instance. For these reasons (and probably implementation reasons as well), precompiled headers are really limited.&lt;/p&gt;
&lt;p&gt;If we take the case of G++, G++ will consider the precompiled header file instead of the standard header only if (for a complete list, take a look at the GCC docs):&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The same compilation flags are the same between the two compilations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The same compiler binary is used for the compilations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Only one precompiled header can be used in each compilation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The same macros must be defined&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The include of the header must be before every possible C/C++ token&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If all these conditions are met and you try to &lt;cite&gt;#include "header.hpp&lt;/cite&gt; and there is a header.hpp.gch (the precompiled file) available in the search path, then the precompiled header will be taken instead of the standard one.&lt;/p&gt;
&lt;p&gt;With clang, it is a bit different because the precompiled header cannot be included automatically, but has to be included explicitely in the source code, meaning you have to modify your code for this technique to work. This is a bad thing in my opinion, you never should have to modify your code to profit from a compiler feature. This is why I haven't used and don't plan to use precompiled headers with clang.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="how-to"&gt;
&lt;h2&gt;How-to&lt;/h2&gt;
&lt;p&gt;Once you know all the conditions for a precompiled header to be automatically included, it is quite straightforward to use them.&lt;/p&gt;
&lt;p&gt;To generate a PCH file is easy:&lt;/p&gt;
&lt;pre class="literal-block"&gt;g++ options header.hpp&lt;/pre&gt;
&lt;p&gt;This will generate header.hpp.gch. When you compile your source file using header.hpp, you don't have anything to do, you just have to compile it as usually and if all the conditions are met, the PCH file will be used instead of the other header.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="results-and-conclusion"&gt;
&lt;h2&gt;Results and conclusion&lt;/h2&gt;
&lt;p&gt;I added precompiled header support into my &lt;a class="reference external" href="https://github.com/wichtounet/make-utils"&gt;make-utils&lt;/a&gt; collection of Makefile utilities and tested it on ETL. I have precompiled a header that itself included Catch and ETL. Almost all test files are including this header. With this change, I went from 84 seconds to 78seconds. Headers are taking 1.5seconds to be precompiled. This is a nice result I think. If your application is not as template-heavy as mine or if you have more source files, you should expect better improvements.&lt;/p&gt;
&lt;p&gt;To conclude, even if precompiled headers are a sound way to reduce compile-time, they are really limited to some cases. I'm not a fan of the feature overally. It is not portable between compilers and not standard. Anyway, if you are really in need of saving some time, you should not hesitate too much ;)&lt;/p&gt;
&lt;/section&gt;</description><category>C++</category><category>Compilers</category><category>etl</category><category>gcc</category><category>Performances</category><guid>https://baptiste-wicht.com/posts/2015/06/improve-etl-compile-time-with-precompiled-headers.html</guid><pubDate>Sat, 20 Jun 2015 13:08:31 GMT</pubDate></item><item><title>Continuous Performance Management with CPM for C++</title><link>https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;For some time, I have wanted some tool to monitor the performance of some of my projects. There are plenty of tools for Continuous Integration and Sonar is really great for continuous monitoring of code quality, but I haven't found anything satisfying for monitoring performance of C++ code. So I decided to write my own. &lt;a class="reference external" href="https://github.com/wichtounet/cpm"&gt;Continous Performance Monitor (CPM)&lt;/a&gt; is a simple C++ tool that helps you running benchmarks for your C++ programs and generate web reports based on the results. In this article, I will present this tool. CPM is especially made to benchmark several sub parts of libraries, but it perfectly be used to benchmark a whole program as well.&lt;/p&gt;
&lt;p&gt;The idea is to couple it with a Continuous Integration tool (I use Jenkins for instance) and run the benchmarks for every new push in a repository. With that, you can check if you have performance regression for instance or simply see if your changes were really improving the performance as much as you thought.&lt;/p&gt;
&lt;p&gt;It is made of two separate parts:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;A header-only library that you can use to benchmark your program and that will give you the performance results. It will also generate a JSON report of the collected data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A program that will generate a web version of the reports with analysis over time, over different compilers or over different configurations.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;CPM is especially made to benchmark functions that takes input data and which runtime depends on the dimensions of the input data. For each benchmark, CPM will execute it with several different input sizes. There are different ways to define a benchmark:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;two_pass&lt;/em&gt;: The benchmark is made of two part, the initialization part is called once for each input size and then the benchmark part is repeated several times for the measure. This is the most general version.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;global&lt;/em&gt;: The benchmark will be run with different input sizes but uses global data that will be randomized before each measure&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;simple&lt;/em&gt;: The benchmark will be run with different input sizes, data will not be randomized&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;once&lt;/em&gt;: The benchmark will be run with no input size.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note: The randomization of the data can be disabled.&lt;/p&gt;
&lt;p&gt;You can run independent benchmarks or you can run sections of benchmarks. A section is used to compared different implementations of the same thing. For instance, I use them to compare different implementation of convolution or to see how ETL compete with other Expression Templates library.&lt;/p&gt;
&lt;img alt="/images/cpm_large.png" src="https://baptiste-wicht.com/images/cpm_large.png"&gt;
&lt;section id="examples"&gt;
&lt;h2&gt;Examples&lt;/h2&gt;
&lt;p&gt;I've uploaded three generated reports so that you can have look at some results:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Results of the ETL benchmark (one page per benchmark): &lt;a class="reference external" href="http://baptiste-wicht.com/cpm/etl/"&gt;Site 1&lt;/a&gt; &lt;a class="reference external" href="https://github.com/wichtounet/etl/blob/master/workbench/benchmark.cpp"&gt;Source 1&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Results of simple ETL / Blaze / Eigen3 comparison: &lt;a class="reference external" href="http://baptiste-wicht.com/cpm/etl_blaze_eigen/"&gt;Site 2&lt;/a&gt; &lt;a class="reference external" href="https://github.com/wichtounet/etl_vs_blaze/blob/master/src/simple.cpp"&gt;Source 2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Results of the CPM dummy examples: &lt;a class="reference external" href="http://baptiste-wicht.com/cpm/examples/"&gt;Site 3&lt;/a&gt; &lt;a class="reference external" href="https://github.com/wichtounet/cpm/blob/master/examples/simple.cpp"&gt;Source 3&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="run-benchmarks"&gt;
&lt;h2&gt;Run benchmarks&lt;/h2&gt;
&lt;p&gt;There are two ways of running CPM. You can directly use the library to run the benchmarks or you can use the macro facilities to make it easier. I recommend to use the second way since it is easier and I'm gonna try to keep it stable while the library can change. If you want an example of using the library directly, you can take a look at &lt;a class="reference external" href="https://github.com/wichtounet/cpm/blob/master/examples/simple.cpp"&gt;this example&lt;/a&gt;. In this chapter, I'm gonna focus on the macro-way.&lt;/p&gt;
&lt;p&gt;The library is available &lt;a class="reference external" href="https://github.com/wichtounet/cpm"&gt;here&lt;/a&gt;, you can either include as a submodule of your projects or install it globally to have access to its headers.&lt;/p&gt;
&lt;p&gt;The first thing to do is to include the CPM header:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code cpp"&gt;&lt;a id="rest_code_00405d2a2e7b4e5a87fd373026e1f695-1" name="rest_code_00405d2a2e7b4e5a87fd373026e1f695-1" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_00405d2a2e7b4e5a87fd373026e1f695-1"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#define CPM_BENCHMARK "Example Benchmarks"&lt;/span&gt;
&lt;a id="rest_code_00405d2a2e7b4e5a87fd373026e1f695-2" name="rest_code_00405d2a2e7b4e5a87fd373026e1f695-2" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_00405d2a2e7b4e5a87fd373026e1f695-2"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;"cpm/cpm.hpp"&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You have to name your benchmark. This will automatically creates a main and will run all the declared benchmark.&lt;/p&gt;
&lt;section id="define-benchmarks"&gt;
&lt;h3&gt;Define benchmarks&lt;/h3&gt;
&lt;p&gt;Benchmarks can be defined either in a CPM_BENCH functor or in the global scope with &lt;code&gt;CPM_DIRECT_BENCH&lt;/code&gt;.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;simple&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="code"&gt;&lt;pre class="code cpp"&gt;&lt;a id="rest_code_22609c8f694e409481adc4833aba3387-1" name="rest_code_22609c8f694e409481adc4833aba3387-1" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_22609c8f694e409481adc4833aba3387-1"&gt;&lt;/a&gt;&lt;span class="n"&gt;CPM_DIRECT_BENCH_SIMPLE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bench_name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;factor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;_ns&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The first argument is the name of the benchmark and the second argument is the function that will be benchmarked by the system, this function takes the input size as input.&lt;/p&gt;
&lt;ol class="arabic simple" start="2"&gt;
&lt;li&gt;&lt;p&gt;global&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="code"&gt;&lt;pre class="code cpp"&gt;&lt;a id="rest_code_b2612c2001684f37aee95fc3f9627f26-1" name="rest_code_b2612c2001684f37aee95fc3f9627f26-1" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_b2612c2001684f37aee95fc3f9627f26-1"&gt;&lt;/a&gt;&lt;span class="n"&gt;CPM_BENCH&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;a id="rest_code_b2612c2001684f37aee95fc3f9627f26-2" name="rest_code_b2612c2001684f37aee95fc3f9627f26-2" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_b2612c2001684f37aee95fc3f9627f26-2"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;a id="rest_code_b2612c2001684f37aee95fc3f9627f26-3" name="rest_code_b2612c2001684f37aee95fc3f9627f26-3" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_b2612c2001684f37aee95fc3f9627f26-3"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;CPM_GLOBAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bench_name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;factor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;_ns&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_b2612c2001684f37aee95fc3f9627f26-4" name="rest_code_b2612c2001684f37aee95fc3f9627f26-4" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_b2612c2001684f37aee95fc3f9627f26-4"&gt;&lt;/a&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The first argument is the name of the benchmark, the second is the function being benchmarked and the following arguments must be references to global data that will be randomized by CPM.&lt;/p&gt;
&lt;ol class="arabic simple" start="3"&gt;
&lt;li&gt;&lt;p&gt;two_pass&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="code"&gt;&lt;pre class="code cpp"&gt;&lt;a id="rest_code_6938745246e34f79ab966599726618f3-1" name="rest_code_6938745246e34f79ab966599726618f3-1" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_6938745246e34f79ab966599726618f3-1"&gt;&lt;/a&gt;&lt;span class="n"&gt;CPM_DIRECT_BENCH_TWO_PASS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bench_name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_6938745246e34f79ab966599726618f3-2" name="rest_code_6938745246e34f79ab966599726618f3-2" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_6938745246e34f79ab966599726618f3-2"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;make_tuple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;a id="rest_code_6938745246e34f79ab966599726618f3-3" name="rest_code_6938745246e34f79ab966599726618f3-3" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_6938745246e34f79ab966599726618f3-3"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;factor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;_ns&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_6938745246e34f79ab966599726618f3-4" name="rest_code_6938745246e34f79ab966599726618f3-4" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_6938745246e34f79ab966599726618f3-4"&gt;&lt;/a&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Again, the first argument is the name. The second argument is the initialization functor. This functor must returns a tuple with all the information that will be passed (unpacked) to the third argument (the benchmark functor). Everything that is being returned by the initialization functor will be randomized.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="select-the-input-sizes"&gt;
&lt;h3&gt;Select the input sizes&lt;/h3&gt;
&lt;p&gt;By default, CPM will invoke your benchmarks with values from 10 to 1000000, multiplying it by 10 each step. This can be tuned for each benchmark and section independently. Each benchmark macro has a _P suffix that allows you to set the size policy:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code cpp"&gt;&lt;a id="rest_code_76a9d55df6b34a7db175fa0e28e20226-1" name="rest_code_76a9d55df6b34a7db175fa0e28e20226-1" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_76a9d55df6b34a7db175fa0e28e20226-1"&gt;&lt;/a&gt;&lt;span class="n"&gt;CPM_SIMPLE_P&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;a id="rest_code_76a9d55df6b34a7db175fa0e28e20226-2" name="rest_code_76a9d55df6b34a7db175fa0e28e20226-2" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_76a9d55df6b34a7db175fa0e28e20226-2"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;VALUES_POLICY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;a id="rest_code_76a9d55df6b34a7db175fa0e28e20226-3" name="rest_code_76a9d55df6b34a7db175fa0e28e20226-3" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_76a9d55df6b34a7db175fa0e28e20226-3"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;"simple_a_n"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_76a9d55df6b34a7db175fa0e28e20226-4" name="rest_code_76a9d55df6b34a7db175fa0e28e20226-4" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_76a9d55df6b34a7db175fa0e28e20226-4"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;factor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;_ns&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can also have several sizes (for multidimensional data structures or algorithms):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code cpp"&gt;&lt;a id="rest_code_68802698a222474c95566b297fbff738-1" name="rest_code_68802698a222474c95566b297fbff738-1" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_68802698a222474c95566b297fbff738-1"&gt;&lt;/a&gt;&lt;span class="n"&gt;CPM_DIRECT_BENCH_TWO_PASS_P&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;a id="rest_code_68802698a222474c95566b297fbff738-2" name="rest_code_68802698a222474c95566b297fbff738-2" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_68802698a222474c95566b297fbff738-2"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;NARY_POLICY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;VALUES_POLICY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;VALUES_POLICY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
&lt;a id="rest_code_68802698a222474c95566b297fbff738-3" name="rest_code_68802698a222474c95566b297fbff738-3" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_68802698a222474c95566b297fbff738-3"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;"convmtx2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_68802698a222474c95566b297fbff738-4" name="rest_code_68802698a222474c95566b297fbff738-4" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_68802698a222474c95566b297fbff738-4"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;make_tuple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dmat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dmat&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;d1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;a id="rest_code_68802698a222474c95566b297fbff738-5" name="rest_code_68802698a222474c95566b297fbff738-5" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_68802698a222474c95566b297fbff738-5"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/*d1*/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dmat&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dmat&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;etl&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;convmtx2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_68802698a222474c95566b297fbff738-6" name="rest_code_68802698a222474c95566b297fbff738-6" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_68802698a222474c95566b297fbff738-6"&gt;&lt;/a&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/section&gt;
&lt;section id="configure-benchmarks"&gt;
&lt;h3&gt;Configure benchmarks&lt;/h3&gt;
&lt;p&gt;By default, each benchmark is run 10 times for warmup and then repeated 50 times, but you can define your own values:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code cpp"&gt;&lt;a id="rest_code_e901a86cb229477b96364b199488441e-1" name="rest_code_e901a86cb229477b96364b199488441e-1" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_e901a86cb229477b96364b199488441e-1"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#define CPM_WARMUP 3&lt;/span&gt;
&lt;a id="rest_code_e901a86cb229477b96364b199488441e-2" name="rest_code_e901a86cb229477b96364b199488441e-2" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_e901a86cb229477b96364b199488441e-2"&gt;&lt;/a&gt;&lt;span class="cp"&gt;#define CPM_REPEAT 10&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This must be done before the inclusion of the header.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="define-sections"&gt;
&lt;h3&gt;Define sections&lt;/h3&gt;
&lt;p&gt;Sections are simply a group of benchmarks, so instead of putting several benchmarks inside a &lt;code&gt;CPM_BENCH&lt;/code&gt;, you can put them inside a &lt;code&gt;CPM_SECTION&lt;/code&gt;. For instance:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code cpp"&gt;&lt;a id="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-1" name="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-1" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-1"&gt;&lt;/a&gt;&lt;span class="n"&gt;CPM_SECTION&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"mmul"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-2" name="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-2" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-2"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;CPM_SIMPLE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"std"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;factor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="n"&gt;_ns&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;a id="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-3" name="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-3" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-3"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;CPM_SIMPLE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"fast"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;factor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;_ns&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;a id="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-4" name="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-4" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-4"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;CPM_SIMPLE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"common"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;factor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;_ns&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;a id="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-5" name="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-5" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-5"&gt;&lt;/a&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-6" name="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-6" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-6"&gt;&lt;/a&gt;&lt;span class="n"&gt;CPM_SECTION&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"conv"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-7" name="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-7" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-7"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;CPM_TWO_PASS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"std"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-8" name="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-8" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-8"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;make_tuple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;a id="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-9" name="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-9" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-9"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;factor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;_ns&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-10" name="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-10" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-10"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-11" name="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-11" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-11"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;CPM_TWO_PASS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"fast"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-12" name="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-12" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-12"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;make_tuple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;a id="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-13" name="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-13" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-13"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;factor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;_ns&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-14" name="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-14" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-14"&gt;&lt;/a&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-15" name="rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-15" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_5ca2b2fedcdc407abd6b4a0edbf25651-15"&gt;&lt;/a&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can also set different warmup and repeat values for each section by using &lt;code&gt;CPM_SECTION_O&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code cpp"&gt;&lt;a id="rest_code_2e9e8ac60d0f4450aad7d65decc06049-1" name="rest_code_2e9e8ac60d0f4450aad7d65decc06049-1" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_2e9e8ac60d0f4450aad7d65decc06049-1"&gt;&lt;/a&gt;&lt;span class="n"&gt;CPM_SECTION_O&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"fft"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;51&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_2e9e8ac60d0f4450aad7d65decc06049-2" name="rest_code_2e9e8ac60d0f4450aad7d65decc06049-2" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_2e9e8ac60d0f4450aad7d65decc06049-2"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;a id="rest_code_2e9e8ac60d0f4450aad7d65decc06049-3" name="rest_code_2e9e8ac60d0f4450aad7d65decc06049-3" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_2e9e8ac60d0f4450aad7d65decc06049-3"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;a id="rest_code_2e9e8ac60d0f4450aad7d65decc06049-4" name="rest_code_2e9e8ac60d0f4450aad7d65decc06049-4" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_2e9e8ac60d0f4450aad7d65decc06049-4"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;CPM_GLOBAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"std"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;factor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;_ns&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_2e9e8ac60d0f4450aad7d65decc06049-5" name="rest_code_2e9e8ac60d0f4450aad7d65decc06049-5" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_2e9e8ac60d0f4450aad7d65decc06049-5"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;CPM_GLOBAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"mkl"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sleep_for&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;factor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;_ns&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_2e9e8ac60d0f4450aad7d65decc06049-6" name="rest_code_2e9e8ac60d0f4450aad7d65decc06049-6" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_2e9e8ac60d0f4450aad7d65decc06049-6"&gt;&lt;/a&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;will be warmup 11 times and run 51 times.&lt;/p&gt;
&lt;p&gt;The size policy can also be changed for the complete section (cannot be changed independently for benchmarks inside the section):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code cpp"&gt;&lt;a id="rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-1" name="rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-1" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-1"&gt;&lt;/a&gt;&lt;span class="n"&gt;CPM_SECTION_P&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"mmul"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-2" name="rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-2" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-2"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;NARY_POLICY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;STD_STOP_POLICY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;VALUES_POLICY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;VALUES_POLICY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;a id="rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-3" name="rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-3" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-3"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;a id="rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-4" name="rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-4" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-4"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;a id="rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-5" name="rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-5" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-5"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;CPM_GLOBAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"std"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d3&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* Something */&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-6" name="rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-6" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-6"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;CPM_GLOBAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"mkl"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d3&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* Something */&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-7" name="rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-7" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-7"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;CPM_GLOBAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bla"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d3&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* Something */&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-8" name="rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-8" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_2b5fc9a0008e4c6fa51c9205e4dc4b10-8"&gt;&lt;/a&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/section&gt;
&lt;section id="run"&gt;
&lt;h3&gt;Run&lt;/h3&gt;
&lt;p&gt;Once your benchmarks and sections are defined, you can build you program as a normal C++ main and run it. You can pass several options:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code text"&gt;&lt;a id="rest_code_eb7adc02df184c2ab791b0763a29fbfc-1" name="rest_code_eb7adc02df184c2ab791b0763a29fbfc-1" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_eb7adc02df184c2ab791b0763a29fbfc-1"&gt;&lt;/a&gt;./debug/bin/full -h
&lt;a id="rest_code_eb7adc02df184c2ab791b0763a29fbfc-2" name="rest_code_eb7adc02df184c2ab791b0763a29fbfc-2" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_eb7adc02df184c2ab791b0763a29fbfc-2"&gt;&lt;/a&gt;Usage:
&lt;a id="rest_code_eb7adc02df184c2ab791b0763a29fbfc-3" name="rest_code_eb7adc02df184c2ab791b0763a29fbfc-3" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_eb7adc02df184c2ab791b0763a29fbfc-3"&gt;&lt;/a&gt;  ./debug/bin/full [OPTION...]
&lt;a id="rest_code_eb7adc02df184c2ab791b0763a29fbfc-4" name="rest_code_eb7adc02df184c2ab791b0763a29fbfc-4" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_eb7adc02df184c2ab791b0763a29fbfc-4"&gt;&lt;/a&gt;
&lt;a id="rest_code_eb7adc02df184c2ab791b0763a29fbfc-5" name="rest_code_eb7adc02df184c2ab791b0763a29fbfc-5" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_eb7adc02df184c2ab791b0763a29fbfc-5"&gt;&lt;/a&gt;  -n, --name arg           Benchmark name
&lt;a id="rest_code_eb7adc02df184c2ab791b0763a29fbfc-6" name="rest_code_eb7adc02df184c2ab791b0763a29fbfc-6" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_eb7adc02df184c2ab791b0763a29fbfc-6"&gt;&lt;/a&gt;  -t, --tag arg            Tag name
&lt;a id="rest_code_eb7adc02df184c2ab791b0763a29fbfc-7" name="rest_code_eb7adc02df184c2ab791b0763a29fbfc-7" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_eb7adc02df184c2ab791b0763a29fbfc-7"&gt;&lt;/a&gt;  -c, --configuration arg  Configuration
&lt;a id="rest_code_eb7adc02df184c2ab791b0763a29fbfc-8" name="rest_code_eb7adc02df184c2ab791b0763a29fbfc-8" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_eb7adc02df184c2ab791b0763a29fbfc-8"&gt;&lt;/a&gt;  -o, --output arg         Output folder
&lt;a id="rest_code_eb7adc02df184c2ab791b0763a29fbfc-9" name="rest_code_eb7adc02df184c2ab791b0763a29fbfc-9" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_eb7adc02df184c2ab791b0763a29fbfc-9"&gt;&lt;/a&gt;  -h, --help               Print help
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The tag is used to distinguish between runs, I recommend that you use a SCM identifier for the tag. If you want to run your program with different configurations (compiler options for instance), you'll have to set the configuration with the --configuration option.&lt;/p&gt;
&lt;p&gt;Here is a possible output:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code text"&gt;&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-1" name="rest_code_de49d88a831d4292b4eae8910e25bf69-1" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-1"&gt;&lt;/a&gt; Start CPM benchmarks
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-2" name="rest_code_de49d88a831d4292b4eae8910e25bf69-2" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-2"&gt;&lt;/a&gt;    Results will be automatically saved in /home/wichtounet/dev/cpm/results/10.cpm
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-3" name="rest_code_de49d88a831d4292b4eae8910e25bf69-3" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-3"&gt;&lt;/a&gt;    Each test is warmed-up 10 times
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-4" name="rest_code_de49d88a831d4292b4eae8910e25bf69-4" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-4"&gt;&lt;/a&gt;    Each test is repeated 50 times
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-5" name="rest_code_de49d88a831d4292b4eae8910e25bf69-5" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-5"&gt;&lt;/a&gt;    Time Sun Jun 14 15:33:51 2015
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-6" name="rest_code_de49d88a831d4292b4eae8910e25bf69-6" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-6"&gt;&lt;/a&gt;
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-7" name="rest_code_de49d88a831d4292b4eae8910e25bf69-7" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-7"&gt;&lt;/a&gt;    Tag: 10
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-8" name="rest_code_de49d88a831d4292b4eae8910e25bf69-8" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-8"&gt;&lt;/a&gt;    Configuration:
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-9" name="rest_code_de49d88a831d4292b4eae8910e25bf69-9" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-9"&gt;&lt;/a&gt;    Compiler: clang-3.5.0
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-10" name="rest_code_de49d88a831d4292b4eae8910e25bf69-10" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-10"&gt;&lt;/a&gt;    Operating System: Linux x86_64 3.16.5-gentoo
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-11" name="rest_code_de49d88a831d4292b4eae8910e25bf69-11" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-11"&gt;&lt;/a&gt;
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-12" name="rest_code_de49d88a831d4292b4eae8910e25bf69-12" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-12"&gt;&lt;/a&gt;
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-13" name="rest_code_de49d88a831d4292b4eae8910e25bf69-13" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-13"&gt;&lt;/a&gt; simple_a(10) : mean: 52.5us (52.3us,52.7us) stddev: 675ns min: 48.5us max: 53.3us througput: 190KEs
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-14" name="rest_code_de49d88a831d4292b4eae8910e25bf69-14" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-14"&gt;&lt;/a&gt; simple_a(100) : mean: 50.1us (48us,52.2us) stddev: 7.53us min: 7.61us max: 52.3us througput: 2MEs
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-15" name="rest_code_de49d88a831d4292b4eae8910e25bf69-15" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-15"&gt;&lt;/a&gt; simple_a(1000) : mean: 52.7us (52.7us,52.7us) stddev: 48.7ns min: 52.7us max: 53us througput: 19MEs
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-16" name="rest_code_de49d88a831d4292b4eae8910e25bf69-16" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-16"&gt;&lt;/a&gt; simple_a(10000) : mean: 62.6us (62.6us,62.7us) stddev: 124ns min: 62.6us max: 63.5us througput: 160MEs
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-17" name="rest_code_de49d88a831d4292b4eae8910e25bf69-17" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-17"&gt;&lt;/a&gt; simple_a(100000) : mean: 161us (159us,162us) stddev: 5.41us min: 132us max: 163us througput: 622MEs
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-18" name="rest_code_de49d88a831d4292b4eae8910e25bf69-18" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-18"&gt;&lt;/a&gt; simple_a(1000000) : mean: 1.16ms (1.16ms,1.17ms) stddev: 7.66us min: 1.15ms max: 1.18ms througput: 859MEs
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-19" name="rest_code_de49d88a831d4292b4eae8910e25bf69-19" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-19"&gt;&lt;/a&gt;
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-20" name="rest_code_de49d88a831d4292b4eae8910e25bf69-20" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-20"&gt;&lt;/a&gt;-----------------------------------------
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-21" name="rest_code_de49d88a831d4292b4eae8910e25bf69-21" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-21"&gt;&lt;/a&gt;|            gemm |       std |     mkl |
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-22" name="rest_code_de49d88a831d4292b4eae8910e25bf69-22" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-22"&gt;&lt;/a&gt;-----------------------------------------
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-23" name="rest_code_de49d88a831d4292b4eae8910e25bf69-23" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-23"&gt;&lt;/a&gt;|           10x10 | 51.7189us | 64.64ns |
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-24" name="rest_code_de49d88a831d4292b4eae8910e25bf69-24" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-24"&gt;&lt;/a&gt;|         100x100 | 52.4336us | 63.42ns |
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-25" name="rest_code_de49d88a831d4292b4eae8910e25bf69-25" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-25"&gt;&lt;/a&gt;|       1000x1000 | 56.0097us |  63.2ns |
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-26" name="rest_code_de49d88a831d4292b4eae8910e25bf69-26" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-26"&gt;&lt;/a&gt;|     10000x10000 | 95.6123us | 63.52ns |
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-27" name="rest_code_de49d88a831d4292b4eae8910e25bf69-27" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-27"&gt;&lt;/a&gt;|   100000x100000 | 493.795us | 63.48ns |
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-28" name="rest_code_de49d88a831d4292b4eae8910e25bf69-28" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-28"&gt;&lt;/a&gt;| 1000000x1000000 | 4.46646ms |  63.8ns |
&lt;a id="rest_code_de49d88a831d4292b4eae8910e25bf69-29" name="rest_code_de49d88a831d4292b4eae8910e25bf69-29" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_de49d88a831d4292b4eae8910e25bf69-29"&gt;&lt;/a&gt;-----------------------------------------
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The program will give you for each benchmark, the mean duration (with confidence interval), the standard deviation of the samples, the min and max duration and an estimated throughput. The throughput is simply using the size and the mean duration. Each section is directly compared with an array-like output. Once the benchmark is run, a JSON report will be generated inside the output folder.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="continuous-monitoring"&gt;
&lt;h2&gt;Continuous Monitoring&lt;/h2&gt;
&lt;p&gt;Once you have run the benchmark, you can use the CPM program to generate the web reports. It will generate:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;1 performance graph for each benchmark and section&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;1 graph comparing the performances over time of your benchmark sections if you have run the benchmark several time&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;1 graph comparing different compiler if you have compiled your program with different compiler&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;1 graph comparing different configuration if you have run the benchmark with different configuration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;1 table summary for each benchmark / section&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;First you have to build and install the CPM program (you can have a look at the &lt;a class="reference external" href="https://github.com/wichtounet/cpm/blob/master/README.rst"&gt;Readme&lt;/a&gt; for more informations.&lt;/p&gt;
&lt;p&gt;Several options are available:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code text"&gt;&lt;a id="rest_code_37912592a18149dca76480e06fccfc85-1" name="rest_code_37912592a18149dca76480e06fccfc85-1" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_37912592a18149dca76480e06fccfc85-1"&gt;&lt;/a&gt;Usage:
&lt;a id="rest_code_37912592a18149dca76480e06fccfc85-2" name="rest_code_37912592a18149dca76480e06fccfc85-2" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_37912592a18149dca76480e06fccfc85-2"&gt;&lt;/a&gt;  cpm [OPTION...]  results_folder
&lt;a id="rest_code_37912592a18149dca76480e06fccfc85-3" name="rest_code_37912592a18149dca76480e06fccfc85-3" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_37912592a18149dca76480e06fccfc85-3"&gt;&lt;/a&gt;
&lt;a id="rest_code_37912592a18149dca76480e06fccfc85-4" name="rest_code_37912592a18149dca76480e06fccfc85-4" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_37912592a18149dca76480e06fccfc85-4"&gt;&lt;/a&gt;      --time-sizes             Display multiple sizes in the time graphs
&lt;a id="rest_code_37912592a18149dca76480e06fccfc85-5" name="rest_code_37912592a18149dca76480e06fccfc85-5" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_37912592a18149dca76480e06fccfc85-5"&gt;&lt;/a&gt;  -t, --theme arg              Theme name [raw,bootstrap,boostrap-tabs] (default:bootstrap)
&lt;a id="rest_code_37912592a18149dca76480e06fccfc85-6" name="rest_code_37912592a18149dca76480e06fccfc85-6" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_37912592a18149dca76480e06fccfc85-6"&gt;&lt;/a&gt;  -c, --hctheme theme_name     Highcharts Theme name [std,dark_unica] (default:dark_unica)
&lt;a id="rest_code_37912592a18149dca76480e06fccfc85-7" name="rest_code_37912592a18149dca76480e06fccfc85-7" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_37912592a18149dca76480e06fccfc85-7"&gt;&lt;/a&gt;  -o, --output output_folder   Output folder (default:reports)
&lt;a id="rest_code_37912592a18149dca76480e06fccfc85-8" name="rest_code_37912592a18149dca76480e06fccfc85-8" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_37912592a18149dca76480e06fccfc85-8"&gt;&lt;/a&gt;      --input arg              Input results
&lt;a id="rest_code_37912592a18149dca76480e06fccfc85-9" name="rest_code_37912592a18149dca76480e06fccfc85-9" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_37912592a18149dca76480e06fccfc85-9"&gt;&lt;/a&gt;  -s, --sort-by-tag            Sort by tag instaed of time
&lt;a id="rest_code_37912592a18149dca76480e06fccfc85-10" name="rest_code_37912592a18149dca76480e06fccfc85-10" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_37912592a18149dca76480e06fccfc85-10"&gt;&lt;/a&gt;  -p, --pages                  General several HTML pages (one per bench/section)
&lt;a id="rest_code_37912592a18149dca76480e06fccfc85-11" name="rest_code_37912592a18149dca76480e06fccfc85-11" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_37912592a18149dca76480e06fccfc85-11"&gt;&lt;/a&gt;  -d, --disable-time           Disable time graphs
&lt;a id="rest_code_37912592a18149dca76480e06fccfc85-12" name="rest_code_37912592a18149dca76480e06fccfc85-12" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_37912592a18149dca76480e06fccfc85-12"&gt;&lt;/a&gt;      --disable-compiler       Disable compiler graphs
&lt;a id="rest_code_37912592a18149dca76480e06fccfc85-13" name="rest_code_37912592a18149dca76480e06fccfc85-13" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_37912592a18149dca76480e06fccfc85-13"&gt;&lt;/a&gt;      --disable-configuration  Disable configuration graphs
&lt;a id="rest_code_37912592a18149dca76480e06fccfc85-14" name="rest_code_37912592a18149dca76480e06fccfc85-14" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_37912592a18149dca76480e06fccfc85-14"&gt;&lt;/a&gt;      --disable-summary        Disable summary table
&lt;a id="rest_code_37912592a18149dca76480e06fccfc85-15" name="rest_code_37912592a18149dca76480e06fccfc85-15" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_37912592a18149dca76480e06fccfc85-15"&gt;&lt;/a&gt;  -h, --help                   Print help
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There are 3 themes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;bootstrap&lt;/em&gt;: The default theme, using Bootstrap to make a responsive interface.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;bootstrap-tabs&lt;/em&gt;: Similar to the &lt;em&gt;bootstrap&lt;/em&gt; theme except that only is displayed at the same time for each benchmark, with tabs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;raw&lt;/em&gt; : A very basic theme, only using Highcharts library for graphs. It is very minimalistic&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For instance, here are how the reports are generated for the ETL benchmark:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code text"&gt;&lt;a id="rest_code_60e509c34bc2466cb2e554bb7bc44441-1" name="rest_code_60e509c34bc2466cb2e554bb7bc44441-1" href="https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html#rest_code_60e509c34bc2466cb2e554bb7bc44441-1"&gt;&lt;/a&gt;cpm -p -s -t bootstrap -c dark_unica -o reports results
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here is the graph generated for the "R = A + B + C" benchmark and different compilers:&lt;/p&gt;
&lt;img alt="/images/cpm_etl_compiler.png" src="https://baptiste-wicht.com/images/cpm_etl_compiler.png"&gt;
&lt;p&gt;and its summary:&lt;/p&gt;
&lt;img alt="/images/cpm_etl_summary.png" src="https://baptiste-wicht.com/images/cpm_etl_summary.png"&gt;
&lt;p&gt;Here is the graph for a 2D convolution with ETL:&lt;/p&gt;
&lt;img alt="/images/cpm_etl_section.png" src="https://baptiste-wicht.com/images/cpm_etl_section.png"&gt;
&lt;p&gt;And the graph for different configurations of ETL and the dense matrix matrix multiplication:&lt;/p&gt;
&lt;img alt="/images/cpm_etl_configuration.png" src="https://baptiste-wicht.com/images/cpm_etl_configuration.png"&gt;
&lt;/section&gt;
&lt;section id="conclusion-and-future-work"&gt;
&lt;h2&gt;Conclusion and Future Work&lt;/h2&gt;
&lt;p&gt;Although CPM is already working, there are several things that could be done to improve it further:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The generated web report could benefit from a global summary.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The throughput evaluation should be evaluated more carefully.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The tool should automatically evaluate the number of times that each tests should be run to have a good result instead of global warmup and repeat constants.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A better bootstrapping procedure should be used to determine the quality of the results and compute the confidence intervals.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The performances of the website with lots of graphs should be improved.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make CPM more general-purpose to support larger needs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here it is, I have summed most of the features of the CPM Continuous Performance Analysis tool. I hope that it will be helpful to some of you as well.&lt;/p&gt;
&lt;p&gt;If you have other ideas or want to contribute something to the project, you can directly open an issue or a pull request on &lt;a class="reference external" href="https://github.com/wichtounet/cpm"&gt;Github&lt;/a&gt;. Or contact me via this site or Github.&lt;/p&gt;
&lt;/section&gt;</description><category>Benchmarks</category><category>C++</category><category>Performances</category><category>Sonar</category><category>Tools</category><guid>https://baptiste-wicht.com/posts/2015/06/continuous-performance-management-with-cpm-for-cpp.html</guid><pubDate>Sun, 14 Jun 2015 12:02:57 GMT</pubDate></item><item><title>Improving eddic Boost Spirit parser performances</title><link>https://baptiste-wicht.com/posts/2013/06/improving-eddic-boost-spirit-parser-performances.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;After the last changes on the data-flow framework, the parsing pass of the eddic compiler became the slowest one. I was sure there was some area of optimizations, so I decided to improve its performances.&lt;/p&gt;
&lt;p&gt;In this post, I will present the different techniques I applied and their results.&lt;/p&gt;
&lt;h4&gt;Static grammar&lt;/h4&gt;

&lt;p&gt;The first optimization that I tried was to make the grammar static, meaning that we can declare it static and it will be constructed only once and will be allocated on the data segment.It is indeed heavy to build the lexer especially but also the grammar. I would like to thank &lt;a title="sehe on github" href="https://github.com/sehe"&gt;sehe&lt;/a&gt; for this optimization, he found it after I posted a question on Stackoverflow.&lt;/p&gt;
&lt;p&gt;The lexer was very easy to make static (only add static keyword :) ), but the parser was a bit more complicated because it needs the lexer iterator to get the current position in the file. This problem has been resolved by saving the iterator into the qi::locals of the rules.&lt;/p&gt;
&lt;p&gt;The result of this optimization are amazing. It saved 33% of the time.&lt;/p&gt;
&lt;h4&gt;Expectation points&lt;/h4&gt;

&lt;p&gt;Expectation points have the interesting point that they disallow backtracking and so can improve performances in some cases. Moreover, they are always interesting because they make the grammar clearer and the error messages much better.&lt;/p&gt;
&lt;p&gt;I tried adding more expectation points to the grammar. Unfortunately, there weren't a lot of them to add. Moreover, it seems that there are some quite weird behavior with them because some times it is impossible to add them (causes compilation failure) and sometimes it just make the code don't work anymore the same way, though I don't understand why.&lt;/p&gt;
&lt;p&gt;Anyway, I have been able to add some to the grammar. These changes improve the performance by a bit more than 1%. It is not a lot, but it is still an improvement. Moreover, I'm quite sure that there are more expectation points that can be added to the code. I will take some time again later to try to add more and to understand them better.&lt;/p&gt;
&lt;h4&gt;Less skips&lt;/h4&gt;

&lt;p&gt;In my grammar, I've a special parser for getting the current position in the file to add "debug information" to the AST. This special parser was skipping over its content, but it has no content, since it is artificial. Removing it improved the performance by about half a percent.&lt;/p&gt;
&lt;h4&gt;Improve Position (debug information)&lt;/h4&gt;

&lt;p&gt;As said before, there is a special parser to get the current position in the file. This information is then stored into an eddic::ast::Position structure. This structure was holding the line number, the column, the file name and the contents of the line. The first two were ints and the last two were std::string. Each time, a copy of the strings were necessary.&lt;/p&gt;
&lt;p&gt;I avoided storing the std::string directly by storing only the number of the line as well as the index of the file. Then, the content of the file is stored in the global context and can be accessed if it is necessary to display the line where the error happened.&lt;/p&gt;
&lt;p&gt;This change gave an increase of 10% of the parsing performance.&lt;/p&gt;
&lt;h4&gt;Auto Rules&lt;/h4&gt;

&lt;p&gt;Rules in Boost Spirit have an overhead due to the cost of the virtual calls that are necessary. In theory, auto rules can improve the efficiency of the rules by removing the cost of virtual calls. Moreover, auto rules should also avoid code bloat when the rules are compiled. The rules can be inlined and better optimized.&lt;/p&gt;
&lt;p&gt;I transformed some rules to auto rules to improve performances. Unfortunately, I found that this did not improve the performances. Moreover, transforming some rules to auto rules made the performance worse. I still did let some of the rules as auto rules. I have to say that I was very disappointed by this result, I was really expecting more from this :(&lt;/p&gt;
&lt;h4&gt;Generated Static Lexer&lt;/h4&gt;

&lt;p&gt;The first time the lexer is used, it has to generate the Deterministic Finite Automaton (DFA) that is used to identify the different tokens. This process takes time. There is way to avoid this by using the static model of Boost Spirit Lex. With that, the code is generated with the complete DFA and then it doesn't have to be initialized again.&lt;/p&gt;
&lt;p&gt;I was not expecting a lot from this because the lexer was already static and so was initialized only once. Indeed, it resulted in less than half a percent improvement.&lt;/p&gt;
&lt;h4&gt;Conclusion&lt;/h4&gt;

&lt;p&gt;Even if I've been able to largely reduce the overhead of the parsing by more than 40%, it still has a big overhead. Indeed, it still represents 36 percent of the whole process of compiling a source file. I think it is still too much.&lt;/p&gt;
&lt;p&gt;Moreover, an interesting fact is that the optimization I would have thought to be very effective (auto rules especially) did not have the expected effect, but making the grammar static, which I would not have thought of, was very effective.&lt;/p&gt;
&lt;p&gt;When profiled, the final version shows that quite some time is spent in destructing the multi_pass, which is quite odd. And it also seems that transforming the string operators to ast::Operator is not very effective, but I do not know how to improve that at this point.&lt;/p&gt;
&lt;p&gt;I won't probably work on that again for the version 1.2.4 of eddic, but I will eventually take some time again for the version 1.3.0 to improve it again.&lt;/p&gt;</description><category>Benchmarks</category><category>Boost</category><category>C++11</category><category>Compilers</category><category>EDDI</category><category>Performances</category><guid>https://baptiste-wicht.com/posts/2013/06/improving-eddic-boost-spirit-parser-performances.html</guid><pubDate>Mon, 10 Jun 2013 06:16:42 GMT</pubDate></item><item><title>C++ Benchmark - std::list VS boost::intrusive::list</title><link>https://baptiste-wicht.com/posts/2012/12/cpp-benchmark-std-list-boost-intrusive-list.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;Recently, we saw that the &lt;a title="C++ benchmark – std::vector VS std::list VS std::deque" href="http://www.baptiste-wicht.com/2012/12/cpp-benchmark-vector-list-deque/"&gt;std::list performance was not really good&lt;/a&gt; when it comes to searching it or iterating through it.In this post, we will see an alternative to the std::list: the boost::intrusive::list from the Boost C++ libraries. It is not a well known library but it can be useful in some specific cases. I will focus on how this implementation performs compared to an std::list.&lt;/p&gt;
&lt;p&gt;An intrusive list does not store copies of the object but the objects itself. Because it stores the objects, it is necessary to add information to the object data structure directly to link them together, that is why it is called an intrusive list. This has a big advantage, the list does not have to allocate any memory at all to insert objects. In a std::list if you insert an object, a node object will be created containing a copy of the object, but not in an intrusive list where only the pointers to the next and to the previous element of the inserted object are updated. Another advantage is that if you have a reference to an object you can directly obtain an iterator to this object in O(1), an operation that would be in O(n) with a std::list. Iteration is faster because it needs less memory accesses.&lt;/p&gt;
&lt;p&gt;On the other hand, an intrusive list has also several drawbacks. First of all, it is intrusive. It means that you have to change the definition of the type that is stored inside the intrusive list. Then, and it be very complicated, it is up to the developer to manage the life time of the objects. It means that it is up to you to allocate and deallocate memory for each objects that you want to put in your collection. For instance, if you store an object into an intrusive list and later delete this object without removing it from the list, you will have broken you list. It is also less safe because the container can be modified from outside simply by modifying the pointers directly inside the objects.&lt;/p&gt;
&lt;p&gt;This article is not a tutorial for Boost intrusive collections, I will just focus on the performance aspect, if you want to learn how to use them, you can consult &lt;a href="http://www.boost.org/doc/libs/1_52_0/doc/html/intrusive.html" title="Boost Intrusive 1.52"&gt;the official documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A boost::intrusive::list can be configured in three mode:&lt;/p&gt;
&lt;ol&gt;
    &lt;li&gt;Normal mode: No special features&lt;/li&gt;
    &lt;li&gt;Safe mode: The hook is initialized to a default safe state and the container check this state before inserting a value. The state of a removed node is also updated correctly. It can be used to detect programming errors. It implies a small performance overhead. This is the default mode.&lt;/li&gt;
    &lt;li&gt;Auto unlink mode: When an object gets destructed it is automatically removed from the container. This mode has also the properties of the safe mode.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The mode is chosen at constant time by configuring the hook of the data type. In this article, all three mode will be tested. Here are the four types that will be tested:&lt;/p&gt;
&lt;ul&gt;
    &lt;li&gt;list : std::list&lt;/li&gt;
    &lt;li&gt;normal_ilist : boost::intrusive::list in normal mode&lt;/li&gt;
    &lt;li&gt;safe_ilist : boost::intrusive::list in safe mode&lt;/li&gt;
    &lt;li&gt;auto_unlink_ilist : boost::intrusive::list in auto unlink mode&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The data types are varying in size, they hold an array of longs and the size of the array varies to change the size of the data type. In each graph, the size of the data type is indicated. It is the size of the normal data type. The intrusive data types are always 16 bytes bigger than the normal data types.&lt;/p&gt;
&lt;p&gt;In the graphs and in the text, &lt;em&gt;n&lt;/em&gt; is used to refer to the number of elements of the collection.&lt;/p&gt;
&lt;p&gt;All the tests performed have been performed on an Intel Core i7 Q 820  @ 1.73GHz. The code has been compiled in 64 bits with GCC 4.7.2 with the given options: &lt;em&gt;-std=c++11 -O2 -fomit-frame-pointer -march=native&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;For each graph, the vertical axis represent the amount of time necessary to perform the operations, so the lower values are the better. The horizontal axis is always the number of elements of the collection. For some graph, the logarithmic scale could be clearer, a button is available after each graph to change the vertical scale to a logarithmic scale.&lt;/p&gt;
&lt;p&gt;So let's see these data structures in practice.&lt;/p&gt;
&lt;!--nextpage--&gt;

&lt;h3&gt;Fill list&lt;/h3&gt;

&lt;p&gt;The first test that is performed is how long does it take to fill each data structure. For the std::list, each value is entered directly. For the intrusive list variations, the data are entered into a vector and then pushed back to the intrusive list.&lt;/p&gt;
&lt;p&gt;So, let's test them:&lt;/p&gt;
&lt;div id="graph_0" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_0" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_0(){var graph=new google.visualization.LineChart(document.getElementById('graph_0'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',265,261,491,2926],['200000',534,498,773,4717],['300000',1874,1841,1901,7201],['400000',2670,2657,2749,9696],['500000',3410,3447,3486,11608],['600000',4044,4095,4050,14024],['700000',4653,4549,4626,16750],['800000',5406,5320,5414,18568],['900000',5995,5926,6160,20880],['1000000',6806,6785,6784,22795],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"fill_back - Normal&lt;8u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_0');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;We can see that filling a list is about thrice slower than an intrusive version. This is quite logical because there are much less memory allocations in the case of the intrusive lists. The differences between the different intrusive versions are not very big. The normal version is the fastest, then the auto unlink and finally the safe version is the slowest.&lt;/p&gt;
&lt;p&gt;If we increase the size of the data type a bit:&lt;/p&gt;
&lt;div id="graph_1" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_1" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_1(){var graph=new google.visualization.LineChart(document.getElementById('graph_1'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',850,894,967,2685],['200000',1760,1671,1747,5742],['300000',3987,3916,3970,8517],['400000',5257,5191,5238,10919],['500000',6561,6813,6661,13614],['600000',7943,8197,7954,15874],['700000',9251,9523,9049,18625],['800000',10463,10865,10417,21114],['900000',11736,11992,11734,23824],['1000000',13137,13423,13055,26555],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"fill_back - Normal&lt;32u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_1');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The results remains more or less the same, but this time there is less difference between list and intrusive list.&lt;/p&gt;
&lt;div id="graph_2" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_2" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_2(){var graph=new google.visualization.LineChart(document.getElementById('graph_2'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',2131,2109,2923,39337],['200000',3834,3830,5792,90984],['300000',6454,6508,8604,138087],['400000',8499,8926,11614,185607],['500000',11068,10874,14455,230798],['600000',13009,13052,17307,275473],['700000',14965,15008,20593,326638],['800000',17082,17186,23109,372022],['900000',19283,19476,26182,415426],['1000000',21503,21569,29463,463462],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"fill_back - Normal&lt;1024u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_2');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;This time, the results are really different. The intrusive versions are twenty times faster than a standard list. This comes from dynamic allocations of large blocks that is very expensive. In the case of intrusive list, there are no memory allocations, just modifications of pointers, so it it is normal that for big blocks the difference is higher.&lt;/p&gt;
&lt;p&gt;We can see that for push_back operations, the intrusive are clearly faster. For small data types, they can be up to three times faster. The difference can be much higher with very big data types. There are no big differences between the different versions of the intrusive lists. The normal mode is about 10% faster than the safe mode.&lt;/p&gt;
&lt;!--nextpage--&gt;

&lt;h3&gt;Destruction of list&lt;/h3&gt;

&lt;p&gt;The second test is about the time necessary to destruct a collection. For a list, the list is simply allocated on the heap and destructed with delete operator. For an intrusive list, both the vector and the list are allocated on the heap. The time is computed to delete both the vector and the intrusive list.&lt;/p&gt;
&lt;p&gt;Let's take a look at the results:&lt;/p&gt;
&lt;div id="graph_3" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_3" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_3(){var graph=new google.visualization.LineChart(document.getElementById('graph_3'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',298,283,0,2420],['200000',588,585,0,3132],['300000',1196,1217,1,4487],['400000',2180,2350,1,6478],['500000',3169,3098,1,7259],['600000',3975,3957,1,8609],['700000',4569,4798,1,10018],['800000',5189,5311,1,11501],['900000',5927,6022,1,13004],['1000000',6616,6592,1,14423],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"destruction - Normal&lt;8u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_3');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The impressive result is that the normal mode is almost free. It is normal because the destructor of the objects does not do anything. Neither the list does anything about the state of the object after it has been removed. The other two intrusive versions performs the same and twice faster than a list.&lt;/p&gt;
&lt;p&gt;Let's increase a bit the size:&lt;/p&gt;
&lt;div id="graph_4" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_4" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_4(){var graph=new google.visualization.LineChart(document.getElementById('graph_4'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',515,531,0,1490],['200000',1997,1964,0,3370],['300000',4022,3818,1,5585],['400000',5219,5456,1,7099],['500000',6693,6847,1,9054],['600000',7859,7984,1,11131],['700000',9456,9382,1,12248],['800000',10637,10493,1,14059],['900000',12433,11882,1,15933],['1000000',13646,13098,1,17434],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"destruction - Normal&lt;32u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_4');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;This time, the std::list version is getting closer to the auto unlink and safe versions. The auto unlink version is a bit slower than the safe version. The normal mode is still free.&lt;/p&gt;
&lt;p&gt;Increasing it a bit again:&lt;/p&gt;
&lt;div id="graph_5" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_5" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_5(){var graph=new google.visualization.LineChart(document.getElementById('graph_5'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',4110,4158,1,4388],['200000',8357,8231,1,8949],['300000',12461,11871,1,12417],['400000',16097,16064,1,16249],['500000',19754,19990,1,19181],['600000',23724,24022,1,22019],['700000',28973,28421,2,24928],['800000',32638,32793,1,31896],['900000',37257,36320,1,35455],['1000000',40422,41037,1,39357],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"destruction - Normal&lt;128u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_5');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;We can see that the list is a small bit faster than the other two versions.&lt;/p&gt;
&lt;p&gt;If we push the memory to its limit:&lt;/p&gt;
&lt;div id="graph_6" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_6" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_6(){var graph=new google.visualization.LineChart(document.getElementById('graph_6'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',2582,2689,1,5669],['200000',10506,10450,1,19719],['300000',16490,16429,1,33569],['400000',21521,21702,1,47179],['500000',27188,27217,1,59640],['600000',32507,32470,1,70116],['700000',37288,37344,1,81122],['800000',42825,42611,1,91972],['900000',48004,48315,1,104401],['1000000',53439,53988,1,114482],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"destruction - Normal&lt;1024u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_6');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;This time, the list is again slower. I'm not sure of why this happens, but it is certainly because of the memory allocator that has two allocate too many big blocks, which tends to be more costly than many small.&lt;/p&gt;
&lt;p&gt;For the destruction, the normal mode proved its high strength, being totally free to destroy. The safe and auto unlink modes are proving much more expensive during destruction, but still quite a bit faster than a standard list.&lt;/p&gt;
&lt;p&gt;It is also necessary to keep in mind that the destruction is generally not a common operation and is about 4 times faster than insertion. In practice, neither push_back nor destruction are critical in choosing a data structure.&lt;/p&gt;
&lt;!--nextpage--&gt;

&lt;h3&gt;Linear Search in a list&lt;/h3&gt;

&lt;p&gt;The next operation that will benchmark is the linear search. The container is filled with all the numbers in [0, n] and shuffled. Then, each number in [0, n] is searched in the container with std::find that performs a simple linear search.&lt;/p&gt;
&lt;p&gt;How do the different data structures perform:&lt;/p&gt;
&lt;div id="graph_7" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_7" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_7(){var graph=new google.visualization.LineChart(document.getElementById('graph_7'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['1000',1213,1211,1200,1463],['2000',5715,5561,5931,6992],['3000',12451,11619,11608,14251],['4000',19338,20142,19208,24840],['5000',30147,30889,29863,40523],['6000',42941,44530,43289,59541],['7000',59717,62534,58851,80416],['8000',77519,78889,77188,104198],['9000',99738,99434,100375,135452],['10000',123829,123216,123506,169720],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"linear_search - Normal&lt;8u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_7');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;As expected, the intrusive list versions are faster than the standard list. The margin is about 40%. The intrusive versions have a better locality than the standard list because there is one less indirection.&lt;/p&gt;
&lt;p&gt;Increasing the data type size:&lt;/p&gt;
&lt;div id="graph_8" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_8" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_8(){var graph=new google.visualization.LineChart(document.getElementById('graph_8'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['1000',2901,2862,2940,2757],['2000',14258,14242,14413,18990],['3000',37327,37078,37387,48716],['4000',69310,68935,69152,86224],['5000',109444,110199,108283,134128],['6000',158570,158392,158398,202273],['7000',219949,220489,219070,266387],['8000',285885,289721,283485,342109],['9000',363603,361367,362935,424984],['10000',445068,451009,447940,525289],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"linear_search - Normal&lt;128u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_8');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The margin has decreased a bit to about 15%. As the data object does not fit in cache we have higher cache misses rate.&lt;/p&gt;
&lt;p&gt;If we increase it to the maximum:&lt;/p&gt;
&lt;div id="graph_9" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_9" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_9(){var graph=new google.visualization.LineChart(document.getElementById('graph_9'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['1000',3517,3818,3487,3570],['2000',19342,18999,18875,22972],['3000',59722,59728,60953,60749],['4000',116755,118256,117172,123127],['5000',194630,195522,192977,204278],['6000',287911,287099,291353,296178],['7000',395506,404527,396043,413398],['8000',519636,527051,527183,549713],['9000',662320,669371,672484,693154],['10000',831052,828032,832995,859092],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"linear_search - Normal&lt;1024u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_9');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;Again, the margin decreased, to 3%.&lt;/p&gt;
&lt;p&gt;For linear searching, the intrusive versions are clearly faster, however, not by a high advantage and this advantage tends to get lower with bigger data types. I would really have expected a more interesting result here. We will see with the next results if it gets better.&lt;/p&gt;
&lt;!--nextpage--&gt;

&lt;h3&gt;Iteration over a list&lt;/h3&gt;

&lt;p&gt;This time, we will test the iteration over a whole collection. The iterate is done with the C++11 foreach loop (taking a reference) and the data is used to increment a counter (to make sure the loop is not optimized away.&lt;/p&gt;
&lt;p&gt;Let's start with our small data type:&lt;/p&gt;
&lt;div id="graph_10" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_10" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_10(){var graph=new google.visualization.LineChart(document.getElementById('graph_10'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',204,202,209,276],['200000',430,404,439,715],['300000',967,956,1086,1969],['400000',1837,1826,1870,2672],['500000',2705,2668,2699,3130],['600000',3176,3110,3217,3815],['700000',3856,3678,3575,4363],['800000',4209,4090,4095,4860],['900000',4778,4334,4433,5511],['1000000',4934,4908,4656,5541],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"iterate - Normal&lt;8u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_10');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The standard list is indeed slower than the other versions (by about 20%). Which is expected due to their better data locality. However, the results are not very stable (probably too fast, many things can intervene). I was expecting better results.&lt;/p&gt;
&lt;p&gt;Let's go with a higher data type:&lt;/p&gt;
&lt;div id="graph_11" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_11" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_11(){var graph=new google.visualization.LineChart(document.getElementById('graph_11'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',3694,3635,3467,3830],['200000',6779,6766,6430,7070],['300000',9472,9819,9291,10270],['400000',12354,12322,12165,13588],['500000',15188,15270,15079,16725],['600000',18447,18246,17797,19875],['700000',20925,20578,20596,23256],['800000',23539,23780,23421,26451],['900000',26492,26237,25975,29785],['1000000',29757,29482,28681,32752],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"iterate - Normal&lt;128u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_11');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;This time, the results look better, but there are less difference between the standard list and the intrusive versions. The intrusive versions are faster by about 10%.&lt;/p&gt;
&lt;p&gt;If we take a bigger data type:&lt;/p&gt;
&lt;div id="graph_12" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_12" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_12(){var graph=new google.visualization.LineChart(document.getElementById('graph_12'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',2797,2649,2496,5211],['200000',10148,10227,10199,10701],['300000',15379,15544,15458,15189],['400000',20181,20599,20213,20135],['500000',25481,25517,25381,25567],['600000',30407,30408,30326,29938],['700000',35187,35343,35498,34985],['800000',40094,39874,40172,40027],['900000',44875,45329,45314,44964],['1000000',49584,50143,50405,49737],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"iterate - Normal&lt;1024u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_12');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;This time, there is no more difference between the different versions.&lt;/p&gt;
&lt;p&gt;Just like the results for linear search, the intrusive versions are faster but the difference is not huge. For very small data type, there is a gain of about 15 to 20 percent, but on very big data types, there is no more increase in performance. Again, I would have expected better results for the intrusive versions.&lt;/p&gt;
&lt;!--nextpage--&gt;

&lt;h3&gt;Write to elements of the list&lt;/h3&gt;

&lt;p&gt;This test is almost the same as the previous one, but this time each element of the collection is modified by incrementing one of its field.&lt;/p&gt;
&lt;p&gt;Let's see if the results are different this time.&lt;/p&gt;
&lt;div id="graph_13" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_13" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_13(){var graph=new google.visualization.LineChart(document.getElementById('graph_13'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',267,290,263,343],['200000',579,583,548,889],['300000',1111,1142,1114,2151],['400000',2155,2143,2146,3197],['500000',3193,3186,3163,4168],['600000',4016,3991,3978,5179],['700000',4727,4916,4717,6026],['800000',5560,5534,5440,6977],['900000',6125,6157,6064,7693],['1000000',6830,6920,6785,8613],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"write - Normal&lt;8u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_13');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The results are more stable than before. We can see that the normal mode is leading the results by a bit less than 30%. Just like for iteration, there no real difference between the different modes.&lt;/p&gt;
&lt;p&gt;Let's increase the data size by a bit:&lt;/p&gt;
&lt;div id="graph_14" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_14" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_14(){var graph=new google.visualization.LineChart(document.getElementById('graph_14'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',505,513,530,1069],['200000',1957,1991,2023,3222],['300000',4056,4053,4000,4806],['400000',5572,5493,5587,6408],['500000',6763,6798,6714,8578],['600000',8224,8142,8050,10037],['700000',9520,9622,9568,12272],['800000',10982,11451,10947,13818],['900000',12195,12305,12185,15103],['1000000',14043,13657,13775,17187],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"write - Normal&lt;32u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_14');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The results are the same, still 30% better for the intrusive version.&lt;/p&gt;
&lt;p&gt;Bigger data type again:&lt;/p&gt;
&lt;div id="graph_15" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_15" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_15(){var graph=new google.visualization.LineChart(document.getElementById('graph_15'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',3263,3415,3172,4444],['200000',9467,9434,9513,8855],['300000',13996,13888,14227,13379],['400000',18181,18458,18255,17684],['500000',22751,23292,22956,22067],['600000',27371,27511,27329,25995],['700000',31236,31389,31404,31164],['800000',35749,35970,36132,35224],['900000',39972,40684,40549,39786],['1000000',45119,45124,44840,43937],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"write - Normal&lt;1024u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_15');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;This time, the intrusive version is not faster anymore than the standard list.&lt;/p&gt;
&lt;p&gt;We have seen that when write is made to the data, intrusive list are better than list. The margin is higher than when doing only iteration.&lt;/p&gt;
&lt;!--nextpage--&gt;

&lt;h3&gt;Reverse the list&lt;/h3&gt;

&lt;p&gt;Let's test something more useful with a reverse operation. The reverse member function is used to reverse all the containers.&lt;/p&gt;
&lt;p&gt;Let's see how they perform:&lt;/p&gt;
&lt;div id="graph_16" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_16" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_16(){var graph=new google.visualization.LineChart(document.getElementById('graph_16'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',366,410,390,372],['200000',761,892,758,867],['300000',1278,1354,1308,2062],['400000',2313,2348,2324,3083],['500000',3265,3297,3300,3994],['600000',4036,4171,4063,4890],['700000',4603,4575,4689,6052],['800000',5344,5241,5321,6682],['900000',6107,5912,6251,7623],['1000000',6725,6626,6663,8548],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"reverse - Normal&lt;8u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_16');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The intrusive versions are about 25% faster than standard list. Even if reversal does not need to access the values, the pointers of the intrusive lists have a better locality than the one of a list that can be dispersed through memory.&lt;/p&gt;
&lt;div id="graph_17" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_17" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_17(){var graph=new google.visualization.LineChart(document.getElementById('graph_17'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',585,563,601,1057],['200000',1903,1902,2011,3043],['300000',4121,4041,4049,4671],['400000',5291,5352,5275,6504],['500000',6831,6711,6584,8273],['600000',7980,8113,8105,9900],['700000',9366,9216,9426,11612],['800000',10680,10622,10602,13377],['900000',11917,11908,11915,15140],['1000000',13442,13239,13268,16718],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"reverse - Normal&lt;32u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_17');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The performance improved a bit, to 30% improvement for an intrusive list.&lt;/p&gt;
&lt;p&gt;Let's see if this continue:&lt;/p&gt;
&lt;div id="graph_18" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_18" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_18(){var graph=new google.visualization.LineChart(document.getElementById('graph_18'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',2901,2901,2676,4562],['200000',5889,5942,5613,9059],['300000',8696,8618,8372,13501],['400000',11344,11597,11039,17985],['500000',14431,14123,13954,22546],['600000',17084,16955,16605,27246],['700000',19563,19795,19431,31866],['800000',22806,22428,22092,36645],['900000',25553,25283,25003,41025],['1000000',28049,28174,27590,45780],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"reverse - Normal&lt;128u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_18');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;It did, the intrusive list is more than 40% faster than the standard list !&lt;/p&gt;
&lt;p&gt;What happens a bigger one:&lt;/p&gt;
&lt;div id="graph_19" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_19" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_19(){var graph=new google.visualization.LineChart(document.getElementById('graph_19'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',4723,4823,4585,4373],['200000',11566,11566,11842,8704],['300000',17471,17377,17587,12985],['400000',22778,22867,22866,17452],['500000',28548,28434,28837,21357],['600000',34005,34089,33913,25907],['700000',39189,39235,39284,30398],['800000',45186,44900,44838,34632],['900000',50337,50529,50677,38540],['1000000',56045,56026,56007,42867],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"reverse - Normal&lt;1024u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_19');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The lines have been interchanged! This time the standard list is about 25% faster than the intrusive versions. This time, the better locality of the intrusive versions is not a gain but a loss.&lt;/p&gt;
&lt;p&gt;It is logical that the margin decrease with very big objects during reversal. Indeed, each element is very close one to another, but the pairs of pointers are separated by the size of the data type. The bigger the data type, the higher distance between the pointers and so the worse spatial locality for the pointers. However, I do not explain why there is this big difference...&lt;/p&gt;
&lt;p&gt;The performance of intrusive list are clearly interesting for reversing collection of small data types. However, it seems that for high data types, the standard list is faster.&lt;/p&gt;
&lt;!--nextpage--&gt;

&lt;h3&gt;Sort the list&lt;/h3&gt;

&lt;p&gt;Let's continue with an even more interesting operation, sorting the list. All the versions are sorted with the sort member function.&lt;/p&gt;
&lt;div id="graph_20" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_20" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_20(){var graph=new google.visualization.LineChart(document.getElementById('graph_20'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',16314,16503,16364,24992],['200000',38251,37599,38206,62834],['300000',63573,63521,63878,112848],['400000',90236,88561,88996,168037],['500000',114442,114456,116137,236389],['600000',167279,165879,166617,324302],['700000',190762,190586,190292,377142],['800000',223884,223900,224622,477313],['900000',255248,252795,253070,540602],['1000000',287262,286842,288404,614163],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"sort - Normal&lt;8u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_20');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The intrusive versions are really interesting, being twice faster than a standard list.&lt;/p&gt;
&lt;p&gt;Let's see with a see a higher data type:&lt;/p&gt;
&lt;div id="graph_21" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_21" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_21(){var graph=new google.visualization.LineChart(document.getElementById('graph_21'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',25335,25030,25318,31655],['200000',67224,65937,67185,80637],['300000',118284,116947,119686,139629],['400000',169014,165868,169723,194916],['500000',208261,208245,209291,248475],['600000',294572,287231,294284,339917],['700000',337154,332205,338323,389665],['800000',401654,393743,401249,461721],['900000',440058,434855,440984,506999],['1000000',494861,491360,497710,578176],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"sort - Normal&lt;128u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_21');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The difference is decreasing to about 20%.&lt;/p&gt;
&lt;p&gt;Increasing it again:&lt;/p&gt;
&lt;div id="graph_22" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_22" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_22(){var graph=new google.visualization.LineChart(document.getElementById('graph_22'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_list','safe_list','normal_ilist','list'],['100000',31568,33319,32201,40331],['200000',82407,85932,83987,97310],['300000',143651,150155,144553,163818],['400000',202050,209178,203821,232728],['500000',250514,256374,253415,289493],['600000',345760,351034,348564,391229],['700000',397848,402385,402371,454434],['800000',471832,479947,477681,538873],['900000',518542,527961,524002,596696],['1000000',583622,596696,591025,672353],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"sort - Normal&lt;1024u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_22');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;It decreased again to 18%.&lt;/p&gt;
&lt;p&gt;For sort operations, the intrusive versions are clearly interesting, especially for small data types.&lt;/p&gt;
&lt;!--nextpage--&gt;

&lt;h3&gt;Random insert into the list&lt;/h3&gt;

&lt;p&gt;The last operation that we will test is the random insert. I have to say that this test is not fair. Indeed, in an intrusive list, from an object we can directly get an iterator and insert at a random position. For a standard list, we have to find the iterator by linear search. I think that it is still important because it is one of the advantages of an intrusive container.&lt;/p&gt;
&lt;div id="graph_23" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_23" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_23(){var graph=new google.visualization.LineChart(document.getElementById('graph_23'));var data=google.visualization.arrayToDataTable([['x','auto_unlink_ilist','safe_ilist','normal_ilist','list'],['10000',42,46,41,27729],['20000',49,66,45,48736],['30000',64,66,47,62440],['40000',54,53,48,75253],['50000',59,54,49,88493],['60000',67,49,63,104516],['70000',50,53,50,115514],['80000',55,50,50,128950],['90000',51,51,55,144367],['100000',52,52,76,152721],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"random_insert - Normal&lt;8u&gt;",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_23');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;It is very clear that the performance are much better but it is logical because we are comparing something in O(1) versus something in O(n).&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;In conclusion, intrusive lists are almost always faster than a standard list. However, on my computer and with GCC, the difference is not always very important. It can brings about 20%-30% on some workloads but most likely around 15%. Even if not negligible, it is not a huge improvement. I would have thought that intrusive lists were faster by an higher margin. On some operations like sort, it is clearly more interesting. It has a better data locality than standard list.&lt;/p&gt;
&lt;p&gt;It is also more interesting to fill the collection, because no memory allocation is involved. But of course, you need to take care of memory allocation by yourself, for example in a vector like here or by dynamically allocating the objects one after one. This has also a cost that is not counted in the benchmark.&lt;/p&gt;
&lt;p&gt;If you really have to use a linked list and performance is critical, I advice you to use Boost intrusive list. If performance is not really critical, it is not perhaps not interesting because of the increased complexity.&lt;/p&gt;
&lt;p&gt;There are situations when only intrusive lists can work. If you want the same object to be present in two different list, you can use boost intrusive list with several member hooks, which is not possible with standard list because only a copy is stored, not the object itself. The same is true for objects that are non-copyable, only intrusive list can handle them. And finally, with intrusive lists you can directly get an iterator to an object in O(1) if you have a reference to an object. For a standard list, you have to iterate through the list to find the object. Sometimes, it can be very useful.&lt;/p&gt;
&lt;p&gt;If you are interested, the Boost documentation provides also a &lt;a href="http://www.boost.org/doc/libs/1_52_0/doc/html/intrusive/performance.html" title="Boost Intrusive Performance"&gt;performance benchmark for intrusive list&lt;/a&gt;, but it is very old (GCC 4.1.2). It is interesting to see that the results are better for intrusive lists than on my benchmark. I do not know if it comes from the processor, the memory or from the compiler.&lt;/p&gt;
&lt;p&gt;I hope you found this benchmark interesting. If you have questions, comments, ideas or whatever to say about this benchmark, don't hesitate to comment. I would be glad to answer you :) The same if you find errors in this article. If you have different results, don't hesitate to comment as well.&lt;/p&gt;
&lt;p&gt;The code source of the benchmark is available online: https://github.com/wichtounet/articles/blob/master/src/intrusive_list/bench.cpp&lt;/p&gt;
&lt;script type="text/javascript"&gt;function draw_visualization(){draw_graph_0();draw_graph_1();draw_graph_2();draw_graph_3();draw_graph_4();draw_graph_5();draw_graph_6();draw_graph_7();draw_graph_8();draw_graph_9();draw_graph_10();draw_graph_11();draw_graph_12();draw_graph_13();draw_graph_14();draw_graph_15();draw_graph_16();draw_graph_17();draw_graph_18();draw_graph_19();draw_graph_20();draw_graph_21();draw_graph_22();draw_graph_23();} google.setOnLoadCallback(draw_visualization);&lt;/script&gt;</description><category>Benchmark</category><category>Boost</category><category>C++</category><category>C++11</category><category>Performances</category><guid>https://baptiste-wicht.com/posts/2012/12/cpp-benchmark-std-list-boost-intrusive-list.html</guid><pubDate>Wed, 12 Dec 2012 07:53:17 GMT</pubDate></item><item><title>C++ benchmark – std::vector VS std::list VS std::deque</title><link>https://baptiste-wicht.com/posts/2012/12/cpp-benchmark-vector-list-deque.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;Last week, I wrote a benchmark comparing the performance of std::vector and std::list on different workloads. This previous article received a lot of comments and several suggestions to improve it. The present article is an improvement over the previous article.&lt;/p&gt;
&lt;p&gt;In this article, I will compare the performance of std::vector, std::list and std::deque on several different workloads and with different data types. In this article, when I talk about a list refers to std::list, a vector refers to std::vector and deque to std::deque.&lt;/p&gt;
&lt;p&gt;It is generally said that a list should be used when random insert and remove will be performed (performed in O(1) versus O(n) for a vector or a deque). If we look only at the complexity, the scale of linear search in both data structures should be equivalent, complexity being in O(n). When random insert/replace operations are performed on a vector or a deque, all the subsequent data needs to be moved and so each element will be copied. That is why the size of the data type is an important factor when comparing those two data structures. Because the size of the data type will play an important role on the cost of copying an element.&lt;/p&gt;
&lt;p&gt;However, in practice, there is a huge difference: the usage of the memory caches. All the data in a vector is contiguous where the std::list allocates separately memory for each element. How does that change the results in practice ? The deque is a data structure aiming at having the advantages of both data structures without their drawbacks, we will see how it perform in practice. Complexity analysis does not take the memory hierarchy into level. I believe that in practice, memory hierarchy usage is as important as complexity analysis.&lt;/p&gt;
&lt;p&gt;Keep in mind that all the tests performed are made on vector, list and deque even if other data structures could be better suited to the given workload.&lt;/p&gt;
&lt;p&gt;In the graphs and in the text, &lt;em&gt;n&lt;/em&gt; is used to refer to the number of elements of the collection.&lt;/p&gt;
&lt;p&gt;All the tests performed have been performed on an Intel Core i7 Q 820  @ 1.73GHz. The code has been compiled in 64 bits with GCC 4.7.2 with -02 and -march=native. The code has been compiled with C++11 support (-std=c++11).&lt;/p&gt;
&lt;p&gt;For each graph, the vertical axis represent the amount of time necessary to perform the operations, so the lower values are the better. The horizontal axis is always the number of elements of the collection. For some graph, the logarithmic scale could be clearer, a button is available after each graph to change the vertical scale to a logarithmic scale.&lt;/p&gt;
&lt;p&gt;The data types are varying in size, they hold an array of longs and the size of the array varies to change the size of the data type. The non-trivial data type is made of two longs and has very stupid assignment operator and copy constructor that just does some maths (totally meaningless but costly). One may argue that is not a common copy constructor neither a common assignment operator and one will be right, however, the important point here is that it is costly operators which is enough for this benchmark.&lt;/p&gt;
&lt;h3&gt;Fill&lt;/h3&gt;

&lt;p&gt;The first test that is performed is to fill the data structures by adding elements to the back of the container (using &lt;em&gt;push_back&lt;/em&gt;). Two variations of vector are used, &lt;em&gt;vector_pre&lt;/em&gt; being a std::vector using vector::reserve at the beginning, resulting in only one allocation of memory.&lt;/p&gt;
&lt;p&gt;Lets see the results with a very small data type:&lt;/p&gt;
&lt;div id="graph_0" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_0" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_0(){var graph=new google.visualization.LineChart(document.getElementById('graph_0'));var data=google.visualization.arrayToDataTable([['x','list','vector','deque','vector_pre'],['100000',2545,271,2012,317],['200000',4927,552,998,334],['300000',7310,944,1707,595],['400000',9463,936,2056,1099],['500000',12591,1140,2642,1058],['600000',14351,1894,3125,1237],['700000',16561,1995,3686,1208],['800000',18820,2648,4291,1365],['900000',20832,2777,4962,2268],['1000000',23430,3015,5396,2585],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"fill_back - 8 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_0');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The pre-allocated vector is the fastest by a small margin and the list is 3 times slower than a vector. deque and vector.&lt;/p&gt;
&lt;p&gt;If we consider higher data type:&lt;/p&gt;
&lt;div id="graph_1" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_1" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_1(){var graph=new google.visualization.LineChart(document.getElementById('graph_1'));var data=google.visualization.arrayToDataTable([['x','list','vector','deque','vector_pre'],['100000',104867,55545,66852,21738],['200000',226215,108289,136035,42532],['300000',340910,198343,153446,60317],['400000',445035,217325,269316,80616],['500000',559619,236576,189613,101371],['600000',688422,391354,303729,122447],['700000',799902,405771,426373,138868],['800000',921441,415707,537057,160637],['900000',1006331,439635,263650,177052],['1000000',1113690,464416,372000,199434],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"fill_back - 4096 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_1');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;This time vector and deque are performing at about the same speed. The pre-allocated vector is clearly the winner here. The variations in the results of deque and vector are probably coming from my system that doesn't like allocating so much memory back and forth at this speed.&lt;/p&gt;
&lt;p&gt;Finally, if we use a non-trivial data type:&lt;/p&gt;
&lt;div id="graph_2" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_2" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_2(){var graph=new google.visualization.LineChart(document.getElementById('graph_2'));var data=google.visualization.arrayToDataTable([['x','list','vector','deque','vector_pre'],['100000',8093,8123,10251,8095],['200000',15433,15305,16061,13897],['300000',25964,24643,24450,19954],['400000',33414,30322,32148,27171],['500000',40416,37817,40752,35058],['600000',48991,48594,48785,41049],['700000',55059,55124,55092,47609],['800000',63688,61360,64505,55659],['900000',70550,67636,72329,60952],['1000000',79271,73533,79522,67787],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"fill_back - 16 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_2');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;All data structures are performing more or less the same, with vector_pre being the fastest.&lt;/p&gt;
&lt;p&gt;For push_back operations, pre-allocated vectors is a very good choice if the size is known in advance. The others performs more of less the same.&lt;/p&gt;
&lt;p&gt;I would have expected a better result for pre-allocated vector. If someone find an explanation for such a small margin, I'm interested.&lt;/p&gt;
&lt;h3&gt;Linear Search&lt;/h3&gt;

&lt;p&gt;The first operation is that is tested is the search. The container is filled with all the numbers in [0, N] and shuffled. Then, each number in [0,N] is searched in the container with std::find that performs a simple linear search. In theory, all the data structures should perform the same if we consider their complexity.&lt;/p&gt;
&lt;div id="graph_3" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_3" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_3(){var graph=new google.visualization.LineChart(document.getElementById('graph_3'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['1000',593,1098,318],['2000',2927,5307,1271],['3000',5891,12228,3020],['4000',8663,24415,5081],['5000',12859,36316,8066],['6000',18493,55057,11463],['7000',25057,74344,16022],['8000',38980,99990,21051],['9000',44951,127575,26650],['10000',52281,158216,32557],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"linear_search - 8 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_3');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;It is clear from the graph that the list has very poor performance for searching. The growth is much worse for a list than for a vector or a deque.&lt;/p&gt;
&lt;p&gt;The only reason is the usage of the cache line. When a data is accessed, the data is fetched from the main memory to the cache. Not only the accessed data is accessed, but a whole cacheline is fetched. As the elements in a vector are contiguous, when you access an element, the next element is automatically in the cache. As the main memory is orders of magnitude slower than the cache, this makes a huge difference. In the list case, the processor spends its whole time waiting for data being fetched from memory to the cache, at each fetch, the processor fetches a lot of unnecessary data that are almost always useless.&lt;/p&gt;
&lt;p&gt;The deque is a bit slower than the vector, that is logical because here there are more cache misses due to the segmented parts.&lt;/p&gt;
&lt;p&gt;If we take a bigger data type:&lt;/p&gt;
&lt;div id="graph_4" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_4" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_4(){var graph=new google.visualization.LineChart(document.getElementById('graph_4'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['1000',1116,2683,776],['2000',4983,16675,3537],['3000',12255,44379,10874],['4000',23212,83026,20189],['5000',37392,133353,33609],['6000',55295,193428,47636],['7000',74877,261314,63911],['8000',100903,340157,84647],['9000',126299,435816,107922],['10000',156386,545160,135680],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"linear_search - 128 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_4');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The list is still much slower than the others, but what is interesting is that gap between the deque and the array is decreasing. Let's try with a 4KB data type:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;/p&gt;&lt;div id="graph_5" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;&lt;input id="button_graph_5" type="button" value="Logarithmic scale"&gt;&lt;script type="text/javascript"&gt;function draw_graph_5(){var graph=new google.visualization.LineChart(document.getElementById('graph_5'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['1000',4258,7190,4445],['2000',20584,38411,19825],['3000',48236,113189,55341],['4000',87475,223174,118453],['5000',136945,362421,191967],['6000',197856,530943,281252],['7000',273359,726323,387940],['8000',351223,954463,511276],['9000',447525,1211581,652269],['10000',551556,1497916,807161],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"linear_search - 4096 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_5');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/blockquote&gt;
&lt;p&gt;The performance of the list are still poor but the gap is decreasing. The interesting point is that deque is now faster than vector. I'm not really sure of the reason of this result. It is possible that it comes only from this special size. One thing is sure, the bigger the data size, the more cache misses the processor will get because elements don't fit in cache lines.&lt;/p&gt;
&lt;p&gt;For search, list is clearly slow where deque and vector have about the same performance. It seems that deque is faster than a vector for very large data sizes.&lt;/p&gt;
&lt;h3&gt;Random Insert (+Linear Search)&lt;/h3&gt;

&lt;p&gt;In the case of random insert, in theory, the list should be much faster, its insert operation being in O(1) versus O(n) for a vector or a deque.&lt;/p&gt;
&lt;p&gt;The container is filled with all the numbers in [0, N] and shuffled. Then, 1000 random values are inserted at a random position in the container. The random position is found by linear search. In both cases, the complexity of the search is O(n), the only difference comes from the insert that follow the search. We saw before that the performance of the list were poor for searching, so we'll see if the fast insertion can compensate the slow search.&lt;/p&gt;
&lt;div id="graph_6" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_6" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_6(){var graph=new google.visualization.LineChart(document.getElementById('graph_6'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['10000',8,27,8],['20000',15,45,14],['30000',22,63,21],['40000',29,74,27],['50000',37,87,38],['60000',43,105,44],['70000',50,114,48],['80000',61,130,55],['90000',66,139,61],['100000',70,155,68],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"random_insert - 8 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_6');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;List is clearly slower than the other two data structures that exhibit the same performance. This comes from the very slow linear search. Even if the two other data structures have to move a lot of data, the copy is cheap for small data types.&lt;/p&gt;
&lt;p&gt;Let's increase the size a bit:&lt;/p&gt;
&lt;div id="graph_7" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_7" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_7(){var graph=new google.visualization.LineChart(document.getElementById('graph_7'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['10000',21,53,25],['20000',39,80,48],['30000',57,103,68],['40000',71,122,90],['50000',88,146,112],['60000',102,165,130],['70000',124,190,152],['80000',140,214,175],['90000',157,238,195],['100000',174,268,213],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"random_insert - 32 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_7');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The result are interesting. The list is still the slowest but with a smaller margin. This time deque is faster than the vector by a small margin.&lt;/p&gt;
&lt;p&gt;Again, increasing the data size:&lt;/p&gt;
&lt;div id="graph_8" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_8" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_8(){var graph=new google.visualization.LineChart(document.getElementById('graph_8'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['10000',64,80,89],['20000',108,128,154],['30000',158,182,248],['40000',212,248,347],['50000',281,348,469],['60000',402,443,735],['70000',569,643,1034],['80000',767,775,1347],['90000',978,1002,1614],['100000',1190,1202,1962],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"random_insert - 128 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_8');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;This time, the vector is clearly the looser and deque and list have the same performance. We can say that with a size of 128 bytes, the time to move a lot of the elements is more expensive than searching in the list.&lt;/p&gt;
&lt;p&gt;A huge data type gives us clearer results:&lt;/p&gt;
&lt;div id="graph_9" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_9" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_9(){var graph=new google.visualization.LineChart(document.getElementById('graph_9'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['10000',4430,178,8074],['20000',7918,311,14121],['30000',11043,444,20014],['40000',13806,555,26783],['50000',17421,694,33519],['60000',20663,904,39175],['70000',23599,1147,45111],['80000',26736,1470,50887],['90000',29524,1940,60139],['100000',32005,2534,65098],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"random_insert - 4096 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_9');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The list is more than 20 times faster than the vector and an order of magnitude faster than the deque ! The deque is also twice faster than the vector.&lt;/p&gt;
&lt;p&gt;The fact than the deque is faster than vector is quite simple. When an insertion is made in a deque, the elements can either moved to the end or the beginning. The closer point will be chosen. An insert in the middle is the most costly operation with O(n/2) complexity. It is always more efficient to insert elements in a deque than in vector because at least twice less elements will be moved.&lt;/p&gt;
&lt;p&gt;If we look at the non-trivial data type:&lt;/p&gt;
&lt;div id="graph_10" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_10" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_10(){var graph=new google.visualization.LineChart(document.getElementById('graph_10'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['10000',230,41,425],['20000',376,65,705],['30000',552,84,1054],['40000',692,101,1345],['50000',862,119,1661],['60000',1003,141,1984],['70000',1186,155,2277],['80000',1358,172,2681],['90000',1540,186,2965],['100000',1658,203,3236],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"random_insert - 16 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_10');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The results are about the same as for the previous graph, but the data type is only 16B. The cost of the copy constructors and assignment operators is very important for vector and deque. The list doesn't care because no copy neither assignment of the existing elements is made during insertions (only the inserted element is copied).&lt;/p&gt;
&lt;h3&gt;Random Remove&lt;/h3&gt;

&lt;p&gt;In theory, random remove is the same case than random insert. Now that we've seen the results with random insert, we could expect the same behavior for random remove.&lt;/p&gt;
&lt;p&gt;The container is filled with all the numbers in [0, N] and shuffled. Then, 1000 random values are removed from a random position in the container.&lt;/p&gt;
&lt;p&gt;If we take the same data sizes as the random insert case:&lt;/p&gt;
&lt;div id="graph_11" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_11" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_11(){var graph=new google.visualization.LineChart(document.getElementById('graph_11'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['10000',6,19,5],['20000',12,41,11],['30000',20,55,18],['40000',27,68,25],['50000',34,81,33],['60000',43,101,40],['70000',49,113,45],['80000',59,126,52],['90000',67,138,61],['100000',72,157,65],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"random_remove - 8 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_11');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;div id="graph_12" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_12" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_12(){var graph=new google.visualization.LineChart(document.getElementById('graph_12'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['10000',40,40,63],['20000',85,83,134],['30000',127,132,198],['40000',181,189,282],['50000',245,263,473],['60000',363,376,664],['70000',524,502,960],['80000',743,688,1343],['90000',977,812,1639],['100000',1228,1017,2004],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"random_remove - 128 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_12');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;div id="graph_13" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_13" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_13(){var graph=new google.visualization.LineChart(document.getElementById('graph_13'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['10000',2906,109,5649],['20000',6190,233,11760],['30000',9379,359,18218],['40000',12840,490,23634],['50000',16027,585,30046],['60000',18918,773,36100],['70000',22213,999,42453],['80000',25788,1317,48793],['90000',28975,1762,55043],['100000',30860,2128,59791],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"random_remove - 4096 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_13');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;div id="graph_14" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_14" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_14(){var graph=new google.visualization.LineChart(document.getElementById('graph_14'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['10000',149,27,294],['20000',319,50,608],['30000',481,68,934],['40000',638,89,1236],['50000',794,108,1547],['60000',954,120,1894],['70000',1101,144,2185],['80000',1253,160,2513],['90000',1399,177,2812],['100000',1595,194,3108],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"random_remove - 16 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_14');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The behavior of random remove is the same as the behavior of random insert, for the same reasons. The results are not very interesting, so, let's get to the next workload.&lt;/p&gt;
&lt;h3&gt;Push Front&lt;/h3&gt;

&lt;p&gt;The next operation that we will compare is inserting elements in front of the collection. This is the worst case for vector, because after each insertion, all the previously inserted will be moved and copied. For a list or a deque, it does not make a difference compared to pushing to the back.&lt;/p&gt;
&lt;p&gt;So let's see the results:&lt;/p&gt;
&lt;div id="graph_15" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_15" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_15(){var graph=new google.visualization.LineChart(document.getElementById('graph_15'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['10000',0,0,33],['20000',0,0,135],['30000',0,0,313],['40000',0,0,585],['50000',0,1,913],['60000',0,1,1327],['70000',0,1,1823],['80000',0,1,2405],['90000',0,2,3107],['100000',0,2,4017],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"fill_front - 8 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_15');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The results are crystal-clear and as expected, vector is very bad at inserting elements to the front. The list and the deque results are almost invisible in the graph because it is a free operation for the two data structures. This does not need further explanations. There is no need to change the data size, it will only make vector much slower and my processor hotter.&lt;/p&gt;
&lt;h3&gt;Sort&lt;/h3&gt;

&lt;p&gt;The next operation that is tested is the time necessary to sort the data structures. For the vector and the deque std::sort is used and for a list the member function sort is used.&lt;/p&gt;
&lt;div id="graph_16" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_16" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_16(){var graph=new google.visualization.LineChart(document.getElementById('graph_16'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['100000',9,25,6],['200000',19,61,14],['300000',29,115,22],['400000',40,175,30],['500000',50,233,39],['600000',60,321,48],['700000',71,378,57],['800000',85,457,66],['900000',95,517,74],['1000000',108,593,83],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"sort - 8 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_16');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;For a small data type, the list is several times slower than the other two data structures. This is again due to the very poor spatial locality of the list during the search. vector is slightly faster than a deque, but the difference is not very significant.&lt;/p&gt;
&lt;p&gt;If we increase the size:&lt;/p&gt;
&lt;div id="graph_17" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_17" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_17(){var graph=new google.visualization.LineChart(document.getElementById('graph_17'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['100000',25,32,20],['200000',65,80,48],['300000',103,143,80],['400000',136,197,113],['500000',180,246,149],['600000',223,340,181],['700000',274,396,222],['800000',302,469,266],['900000',358,514,303],['1000000',395,579,337],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"sort - 128 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_17');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The order remains the same but the difference between the list and the other is decreasing.&lt;/p&gt;
&lt;p&gt;With a 1KB data type:&lt;/p&gt;
&lt;div id="graph_18" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_18" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_18(){var graph=new google.visualization.LineChart(document.getElementById('graph_18'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['100000',176,39,168],['200000',389,94,376],['300000',620,168,595],['400000',859,228,823],['500000',1100,285,1059],['600000',1355,392,1301],['700000',1609,452,1555],['800000',1844,539,1797],['900000',2111,597,2054],['1000000',2397,670,2278],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"sort - 1024 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_18');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The list is almost five times faster than the vector and the deque which are both performing the same (with a very slight advantage for vector).&lt;/p&gt;
&lt;p&gt;If we use the non-trivial data type:&lt;/p&gt;
&lt;div id="graph_19" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_19" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_19(){var graph=new google.visualization.LineChart(document.getElementById('graph_19'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['100000',92,26,89],['200000',195,70,188],['300000',301,135,296],['400000',410,195,399],['500000',519,255,510],['600000',638,350,623],['700000',763,410,729],['800000',858,492,846],['900000',971,552,954],['1000000',1090,628,1072],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"sort - 16 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_19');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;Again, the cost of the operators of this type have a strong impact on the vector and deque.&lt;/p&gt;
&lt;h3&gt;Destruction&lt;/h3&gt;

&lt;p&gt;The next test is to calculate the time necessary to the destruction of a container. The containers are dynamically allocated, are filled with n numbers and then their destruction time (via delete) is computed.&lt;/p&gt;
&lt;div id="graph_20" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_20" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_20(){var graph=new google.visualization.LineChart(document.getElementById('graph_20'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['100000',34,1489,0],['200000',70,2838,0],['300000',102,4677,0],['400000',142,6072,0],['500000',173,7737,0],['600000',215,8828,0],['700000',353,10599,1],['800000',321,12115,0],['900000',355,13932,1],['1000000',410,15345,0],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"destruction - 8 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_20');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The results are already interesting. The vector is almost free to destroy, which is logical because that incurs only freeing one array and the vector itself. The deque is slower due to the freeing of each segments. But the list is much more costly than the other two, more than an order of magnitude slower. This is expected because the list have to free the dynamic memory of each node and also has to iterate through all the elements which we saw was slow.&lt;/p&gt;
&lt;p&gt;If we increase the data type:&lt;/p&gt;
&lt;div id="graph_21" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_21" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_21(){var graph=new google.visualization.LineChart(document.getElementById('graph_21'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['100000',898,4403,1],['200000',2488,8499,1],['300000',4091,12499,1430],['400000',5461,16379,1909],['500000',6729,21128,2459],['600000',8164,25719,2729],['700000',9517,31046,3227],['800000',10871,34550,3756],['900000',12392,37176,4163],['1000000',13762,40119,4523],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"destruction - 128 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_21');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;This time we can see that the deque is three times slower than a vector and that the list is still an order of magnitude slower than a vector ! However, the is less difference than before.&lt;/p&gt;
&lt;p&gt;With our biggest data type, now:&lt;/p&gt;
&lt;div id="graph_22" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_22" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_22(){var graph=new google.visualization.LineChart(document.getElementById('graph_22'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['100000',20575,22434,15499],['200000',44234,47254,29848],['300000',67196,69374,39818],['400000',89253,91128,54229],['500000',108689,112557,68090],['600000',131751,135764,75063],['700000',150801,155610,90761],['800000',172365,176957,102830],['900000',192575,193897,112728],['1000000',211507,215274,126348],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"destruction - 4096 bytes",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"us",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_22');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;There is no more difference between list and deque. The vector is still twice faster than them.&lt;/p&gt;
&lt;p&gt;Even if the vector is always faster than the list and deque, keep in mind that the graphs for destruction are in microseconds and so the operations are not very costly. It could make a difference is very time-sensitive application but unlikely in most applications. Moreover, destruction is made only once per data structure, generally, it is not a very important operation.&lt;/p&gt;
&lt;h3&gt;Number Crunching&lt;/h3&gt;

&lt;p&gt;Finally, we can also test a number crunching operation. Here, random elements are inserted into the container that is kept sorted. It means, that the position where the element has to be inserted is first searched by iterating through elements and the inserted. As we talk about number crunching, only 8 bytes elements are tested.&lt;/p&gt;
&lt;div id="graph_23" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_23" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_23(){var graph=new google.visualization.LineChart(document.getElementById('graph_23'));var data=google.visualization.arrayToDataTable([['x','deque','list','vector'],['10000',39,187,33],['20000',150,1247,134],['30000',339,3380,310],['40000',623,6513,547],['50000',958,10757,864],['60000',1394,16098,1257],['70000',1894,22623,1713],['80000',2479,30656,2249],['90000',3162,39451,2858],['100000',3932,49906,3576],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"number_crunching",width:'600px',height:'400px',hAxis:{title:"Number of elements",slantedText:true},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_23');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;Even if there is only 100'000 elements, the list is already an order of magnitude slower than the other two data structures. If we look a the curves of the results, it is easy to see that this will be only worse with higher collection sizes. The list is absolutely not adapted for number crunching operations due to its poor spatial locality.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;To conclude, we can get some facts about each data structure:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;std::list is very very slow to iterate through the collection due to its very poor spatial locality.&lt;/li&gt;
&lt;li&gt;std::vector and std::deque perform always faster than std::list with very small data&lt;/li&gt;
&lt;li&gt;std::list handles very well large elements&lt;/li&gt;
&lt;li&gt;std::deque performs better than a std::vector for inserting at random positions (especially at the front, which is constant time)&lt;/li&gt;
&lt;li&gt;std::deque and std::vector do not support very well data types with high cost of copy/assignment&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This draw simple conclusions on usage of each data structure:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Number crunching: use std::vector or std::deque&lt;/li&gt;
&lt;li&gt;Linear search: use std::vector or std::deque&lt;/li&gt;
&lt;li&gt;Random Insert/Remove:&lt;ul&gt;
&lt;li&gt;Small data size: use std::vector&lt;/li&gt;
&lt;li&gt;Large element size: use std::list (unless if intended principally for searching)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Non-trivial data type: use std::list unless you need the container especially for searching. But for multiple modifications of the container, it will be very slow.&lt;/li&gt;
&lt;li&gt;Push to front: use std::deque or std::list&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I have to say that before writing this new version of the benchmark I did not know std::deque a lot. This is a very good data structure that is very good at inserting at both ends and even in the middle while exposing a very good spatial locality. Even if sometimes slower than a vector, when the operations involves both searching and inserting in the middle, I would say that this structure should be preferred over vectors, especially for data types of medium sizes.&lt;/p&gt;
&lt;p&gt;If you have the time, in practice, the best way to decide is always to benchmark each version, or even to try another data structures. Two operations with the same Big O complexity can perform quite differently in practice.&lt;/p&gt;
&lt;p&gt;I hope that you found this article interesting. If you have any comment or have an idea about an other workload that you would like to test, don't hesitate to post a comment ;) If you have a question on results, don't hesitate as well.&lt;/p&gt;
&lt;p&gt;The code source of the benchmark is available online: &lt;a href="https://github.com/wichtounet/articles/blob/master/src/vector_list/bench.cpp" title="Source code of the benchmark"&gt;https://github.com/wichtounet/articles/blob/master/src/vector_list/bench.cpp&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The older version of the article is still available: &lt;a href="http://www.baptiste-wicht.com/2012/11/cpp-benchmark-vector-vs-list/" title="C++ benchmark – std::vector VS std::list"&gt;C++ benchmark – std::vector VS std::list&lt;/a&gt;&lt;/p&gt;
&lt;script type="text/javascript"&gt;function draw_visualization(){draw_graph_0();draw_graph_1();draw_graph_2();draw_graph_3();draw_graph_4();draw_graph_5();draw_graph_6();draw_graph_7();draw_graph_8();draw_graph_9();draw_graph_10();draw_graph_11();draw_graph_12();draw_graph_13();draw_graph_14();draw_graph_15();draw_graph_16();draw_graph_17();draw_graph_18();draw_graph_19();draw_graph_20();draw_graph_21();draw_graph_22();draw_graph_23();}google.setOnLoadCallback(draw_visualization);&lt;/script&gt;</description><category>Benchmarks</category><category>C++</category><category>C++11</category><category>Performances</category><guid>https://baptiste-wicht.com/posts/2012/12/cpp-benchmark-vector-list-deque.html</guid><pubDate>Mon, 03 Dec 2012 07:58:29 GMT</pubDate></item><item><title>C++ benchmark - std::vector VS std::list</title><link>https://baptiste-wicht.com/posts/2012/11/cpp-benchmark-vector-vs-list.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;&lt;strong&gt;A updated version of this article is available: &lt;a title="C++ benchmark – std::vector VS std::list VS std::deque" href="http://www.baptiste-wicht.com/2012/12/cpp-benchmark-vector-list-deque/"&gt;C++ benchmark – std::vector VS std::list VS std::deque&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In C++, the two most used data structures are the std::vector and the std::list. In this article, we will compare the performance in practice of these two data structures on several different workloads. In this article, when I talk about a list it is the std::list implementation and vector refers to the std::vector implementation.&lt;/p&gt;
&lt;p&gt;It is generally said that a list should be used when random insert and remove will be performed (performed in O(1) versus O(n) for a vector). If we look only at the complexity, search in both data structures should be roughly equivalent, complexity being in O(n). When random insert/replace operations are performed on a vector, all the subsequent data needs to be moved and so each element will be copied. That is why the size of the data type is an important factor when comparing those two data structures.&lt;/p&gt;
&lt;p&gt;However, in practice, there is a huge difference, the usage of the memory caches. All the data in a vector is contiguous where the std::list allocates separately memory for each element. How does that change the results in practice ?&lt;/p&gt;
&lt;p&gt;Keep in mind that all the tests performed are made on vector and list even if other data structures could be better suited to the given workload.&lt;/p&gt;
&lt;p&gt;In the graphs and in the text, &lt;em&gt;n&lt;/em&gt; is used to refer to the number of elements of the collection.&lt;/p&gt;
&lt;p&gt;All the tests performed have been performed on an Intel Core i7 Q 820  @ 1.73GHz. The code has been compiled in 64 bits with GCC 4.7.2 with -02 and -march=native. The code has been compiled with C++11 support (-std=c++11).&lt;/p&gt;
&lt;h3&gt;Fill&lt;/h3&gt;

&lt;p&gt;The first test that is performed is to fill the data structures by adding elements to the back of the container. Two variations of vector are used, vector_pre being a std::vector with the size passed in parameters to the constructor, resulting in only one allocation of memory.&lt;/p&gt;
&lt;div id="graph_0" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_0" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_0(){var graph=new google.visualization.LineChart(document.getElementById('graph_0'));var data=google.visualization.arrayToDataTable([['x','vector_pre','vector','list'],['1000',0,0,1],['10000',0,1,10],['100000',4,11,100],['1000000',7,234,1023]]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"Fill (8 bytes)",width:'600px',height:'400px',vAxis:{title:"Milliseconds",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_0');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;&lt;/p&gt;
&lt;div id="graph_1" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_1" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_1(){var graph=new google.visualization.LineChart(document.getElementById('graph_1'));var data=google.visualization.arrayToDataTable([['x','vector_pre','vector','list'],['1000',0,9,1],['10000',12,245,18],['100000',949,2635,1153],['1000000',9138,23654,11270]]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"Fill (1024 bytes)",width:'600px',height:'400px',vAxis:{title:"Milliseconds",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_1');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;All data structures are impacted the same way when the data size increases, because there will be more memory to allocate. The vector_pre is clearly the winner of this test, being one order of magnitude faster than a list and about twice faster than a vector without pre-allocation. The result are directly linked to the allocations that have to be performed, allocation being slow. Whatever the data size is, push_back to a vector will always be faster than to a list. This is logical becomes vector allocates more memory than necessary and so does not need to allocate memory for each element.&lt;/p&gt;
&lt;p&gt;But this test is not very interesting, generally building the data structure is not critical. What is critical is the operations that are performed on the data structure. That will be tested in the next sections.&lt;/p&gt;
&lt;!--nextpage--&gt;

&lt;h3&gt;Random Find&lt;/h3&gt;

&lt;p&gt;The first operation is that is tested is the search. The container is filled with all the numbers in [0, N] and shuffled. Then, each number in [0,N] is searched in the container with std::find that performs a simple linear search.&lt;/p&gt;
&lt;div id="graph_2" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_2" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_2(){var graph=new google.visualization.LineChart(document.getElementById('graph_2'));var data=google.visualization.arrayToDataTable([['x','vector','list'],['100',0,11],['1000',0,1545],['5000',0,35886],['10000',0,150865],['20000',0,614496]]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"Find (8 bytes)",width:'600px',height:'400px',vAxis:{title:"Microseconds",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_2');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;Yes, vector is present in the graph, its line is the same as the x line ! Performing a &lt;strong&gt;linear search in a vector is several orders of magnitude faster than in a list&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The only reason is the usage of the cache line. When a data is accessed, the data is fetched from the main memory to the cache. Not only the accessed data is accessed, but a whole cacheline is fetched. As the elements in a vector are contiguous, when you access an element, the next element is automatically in the cache. As the main memory is orders of magnitude slower than the cache, this makes a huge difference. In the list case, the processor spends its whole time waiting for data being fetched from memory to the cache.&lt;/p&gt;
&lt;p&gt;If we augment the size of the data type to 1KB, the results remain the same, but slower:&lt;/p&gt;
&lt;div id="graph_3" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_3" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_3(){var graph=new google.visualization.LineChart(document.getElementById('graph_3'));var data=google.visualization.arrayToDataTable([['x','vector','list'],['100',0,11],['1000',0,3551],['5000',0,195429],['10000',0,829631],['20000',0,3356432]]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"Find (1024 bytes)",width:'600px',height:'400px',vAxis:{title:"Microseconds",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_3');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;!--nextpage--&gt;

&lt;h3&gt;Random Insert&lt;/h3&gt;

&lt;p&gt;In the case of random insert, in theory, the list should be much faster, its insert operation being in O(1) versus O(n) for a vector.&lt;/p&gt;
&lt;p&gt;The container is filled with all the numbers in [0, N] and shuffled. Then, 1000 random values are inserted at a random position in the container. The random position is found by linear search. In both cases, the complexity of the search is O(n), the only difference comes from the insert that follow the search.&lt;/p&gt;
&lt;div id="graph_4" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_4" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_4(){var graph=new google.visualization.LineChart(document.getElementById('graph_4'));var data=google.visualization.arrayToDataTable([['x','vector','list'],['1000',9,85],['2000',9,85],['4000',10,94],['6000',12,98],['8000',13,106],['10000',14,106]]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"Insert (8 bytes)",width:'600px',height:'400px',vAxis:{title:"Milliseconds",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_4');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;When, the vector should be slower than the list, it is almost an order of magnitude faster. Again, this is because finding the position in a list is much slower than copying a lot of small elements.&lt;/p&gt;
&lt;p&gt;If we increase the size:&lt;/p&gt;
&lt;div id="graph_5" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_5" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_5(){var graph=new google.visualization.LineChart(document.getElementById('graph_5'));var data=google.visualization.arrayToDataTable([['x','vector','list'],['1000',27,120],['2000',30,113],['4000',34,122],['6000',37,140],['8000',42,145],['10000',47,155]]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"Insert (32 bytes)",width:'600px',height:'400px',vAxis:{title:"Milliseconds",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_5');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The two lines are getting closer, but vector is still faster.&lt;/p&gt;
&lt;p&gt;Increase it to 1KB:&lt;/p&gt;
&lt;div id="graph_6" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_6" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_6(){var graph=new google.visualization.LineChart(document.getElementById('graph_6'));var data=google.visualization.arrayToDataTable([['x','vector','list'],['1000',1821,167],['2000',1941,163],['4000',2383,191],['6000',2679,207],['8000',2960,214],['10000',3308,228]]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"Insert (1024 bytes)",width:'600px',height:'400px',vAxis:{title:"Milliseconds",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_6');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;This time, list outperforms vector by an order of magnitude ! The performance of random insert in a list are not impacted much by the size of the data type, where vector suffers a lot when big sizes are used. We can also see that list doesn't seem to care about the size of the collection. It is because the size of the collection only impact the search and not the insertion and as few search are performed, it does not change the results a lot.&lt;/p&gt;
&lt;p&gt;If the iterator was already known (no need for linear search), it would be faster to insert into a list than into the vector.&lt;/p&gt;
&lt;h3&gt;Random Remove&lt;/h3&gt;

&lt;p&gt;In theory, random remove is the same case than random insert. Now that we've seen the results with random insert, we could expect the same behavior for random remove.&lt;/p&gt;
&lt;p&gt;The container is filled with all the numbers in [0, N] and shuffled. Then, 1000 random values are removed from a random position in the container.&lt;/p&gt;
&lt;div id="graph_7" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_7" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_7(){var graph=new google.visualization.LineChart(document.getElementById('graph_7'));var data=google.visualization.arrayToDataTable([['x','vector','list'],['100',0,0],['1000',0,0],['10000',40,0],['50000',949,2],['100000',3937,4],['200000',16003,9],['300000',42393,12]]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"Push front (8 bytes)",width:'600px',height:'400px',vAxis:{title:"Milliseconds",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_7');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;Again, vector is several times faster and looks to scale better. Again, this is because it is very cheap to copy small elements.&lt;/p&gt;
&lt;p&gt;Let's increase it directly to 1KB element.&lt;/p&gt;
&lt;div id="graph_8" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_8" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_8(){var graph=new google.visualization.LineChart(document.getElementById('graph_8'));var data=google.visualization.arrayToDataTable([['x','vector','list'],['1000',0,0],['10000',2,26],['100000',163,684],['1000000',2147,15950],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"Sort (8 bytes)",width:'600px',height:'400px',vAxis:{title:"Milliseconds",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_8');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The two lines have been reversed !&lt;/p&gt;
&lt;p&gt;The behavior of random remove is the same as the behavior of random insert, for the same reasons.&lt;/p&gt;
&lt;!--nextpage--&gt;

&lt;h3&gt;Push Front&lt;/h3&gt;

&lt;p&gt;The next operation that we will compare is inserting elements in front of the collection. This is the worst case for vector, because after each insertion, all the previously inserted will be moved and copied. For a list, it does not make a difference compared to pushing to the back.&lt;/p&gt;
&lt;div id="graph_9" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_9" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_9(){var graph=new google.visualization.LineChart(document.getElementById('graph_9'));var data=google.visualization.arrayToDataTable([['x','vector','list'],['100',0,0],['1000',0,0],['10000',40,0],['50000',949,2],['100000',3937,4],['200000',16003,9],['300000',42393,12]]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"Push front (8 bytes)",width:'600px',height:'400px',vAxis:{title:"Milliseconds",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_9');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The results are crystal-clear and as expected. vector is very bad at inserting elements to the front. This does not need further explanations. There is no need to change the data size, it will only make vector much slower.&lt;/p&gt;
&lt;h3&gt;Sort&lt;/h3&gt;

&lt;p&gt;The next operation that is tested is the performance of sorting a vector or a list. For a vector std::sort is used and for a list the member function sort is used.&lt;/p&gt;
&lt;div id="graph_10" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_10" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_10(){var graph=new google.visualization.LineChart(document.getElementById('graph_10'));var data=google.visualization.arrayToDataTable([['x','vector','list'],['1000',0,0],['10000',2,26],['100000',163,684],['1000000',2147,15950],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"Sort (8 bytes)",width:'600px',height:'400px',vAxis:{title:"Milliseconds",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_10');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;We can see that sorting a list is several times slower. It comes from the poor usage of the cache.&lt;/p&gt;
&lt;p&gt;If we increase the size of the element to 1KB:&lt;/p&gt;
&lt;div id="graph_11" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_11" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_11(){var graph=new google.visualization.LineChart(document.getElementById('graph_11'));var data=google.visualization.arrayToDataTable([['x','vector','list'],['1000',2,0],['10000',224,50],['100000',4289,1083],['1000000',50973,17975],]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"Sort (1024 bytes)",width:'600px',height:'400px',vAxis:{title:"Milliseconds",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_11');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;This time the list is faster than the vector. It is not very clear on the graph, but the values for the list are almost the same as for the previous results. That is because std::list::sort() does not perform any copy, only pointers to the elements are changed. On the other hand, swapping two elements in a vector involves at least three copies, so the cost of sorting will increase as the cost of copying increases.&lt;/p&gt;
&lt;!--nextpage--&gt;

&lt;h3&gt;Number Crunching&lt;/h3&gt;

&lt;p&gt;Finally, we can also test a number crunching operation. Here, random elements are inserted into the container that is kept sorted. It means, that the position where the element has to be inserted is first searched by iterating through elements and the inserted. As we talk about number crunching, only 8 bytes elements are tested.&lt;/p&gt;
&lt;div id="graph_12" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_12" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_12(){var graph=new google.visualization.LineChart(document.getElementById('graph_12'));var data=google.visualization.arrayToDataTable([['x','vector','list'],['1000',0,0],['10000',45,166],['50000',928,10665],['100000',3753,50766],['200000',15185,231480],['300000',34293,715892]]);var options={curveType:"function",animation:{duration:1200,easing:"in"},title:"Random Sorted Insert (8 bytes)",width:'600px',height:'400px',vAxis:{title:"Milliseconds",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_12');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;We can clearly see that vector is more than an order of magnitude faster than list and this will only be more as the size of the collection increase. This is because traversing the list is much more expensive than copying the elements of the vector.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;To conclude, we can get some facts about each data structure:&lt;/p&gt;
&lt;ul&gt;
    &lt;li&gt;std::vector is insanely faster than std::list to find an element&lt;/li&gt;
    &lt;li&gt;std::vector performs always faster than std::list with very small data&lt;/li&gt;
    &lt;li&gt;std::vector is always faster to push elements at the back than std::list&lt;/li&gt;
    &lt;li&gt;std::list handles very well large elements, especially for sorting or inserting in the front&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This draw simple conclusions on usage of each data structure:&lt;/p&gt;
&lt;ul&gt;
    &lt;li&gt;Number crunching: use std::vector&lt;/li&gt;
    &lt;li&gt;Linear search: use std::vector&lt;/li&gt;
    &lt;li&gt;Random Insert/Remove: use std::list (if data size very small (&amp;lt; 64B on my computer), use std::vector)&lt;/li&gt;
    &lt;li&gt;Big data size: use std::list (not if intended for searching)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have the time, in practice, the best way to decide is always to benchmark both versions, or even to try another data structures.&lt;/p&gt;
&lt;p&gt;I hope that you found this article interesting. If you have any comment or have an idea about an other workload that you would like to test, don't hesitate to post a comment ;) If you have a question on results, don't hesitate as well.&lt;/p&gt;
&lt;p&gt;The code source of the benchmark is available online: &lt;a title="Source code of the benchmark" href="https://github.com/wichtounet/articles/blob/master/src/vector_list/bench.cpp"&gt;https://github.com/wichtounet/articles/blob/master/src/vector_list/bench.cpp&lt;/a&gt;&lt;/p&gt;
&lt;script type="text/javascript"&gt;function draw_visualization(){draw_graph_0();draw_graph_1();draw_graph_2();draw_graph_3();draw_graph_4();draw_graph_5();draw_graph_6();draw_graph_7();draw_graph_8();draw_graph_9();draw_graph_10();draw_graph_11();draw_graph_12();}google.setOnLoadCallback(draw_visualization);&lt;/script&gt;</description><category>Benchmarks</category><category>C++</category><category>C++11</category><category>Performances</category><guid>https://baptiste-wicht.com/posts/2012/11/cpp-benchmark-vector-vs-list.html</guid><pubDate>Mon, 26 Nov 2012 07:47:35 GMT</pubDate></item><item><title>GCC 4.7 vs CLang 3.1 on eddic</title><link>https://baptiste-wicht.com/posts/2012/11/gcc-4-7-clang-3-1-eddic.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;&lt;a href="http://www.baptiste-wicht.com/2012/11/eddic-compiles-with-clang-3-1/" title="eddic compiles with CLang 3.1"&gt;Now that eddic can be compiled with CLang&lt;/a&gt;, I wanted to compare the differences in compilation time and in performance of the generated executable between those two compilers. The tests are done using GCC 4.7.2 and CLang 3.1 on Gentoo.&lt;/p&gt;
&lt;h3&gt;Compilation Time&lt;/h3&gt;

&lt;p&gt;The first thing that I tested has been the compilation time of the two compilers to compile eddic with different flags. I tested the compilation in debug mode and with -O2 and -O3.&lt;/p&gt;
&lt;div id="graph_0" style="width: 400px; height: 300px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_0" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_0(){var graph=new google.visualization.ColumnChart(document.getElementById('graph_0'));var data=google.visualization.arrayToDataTable([['Options','GCC','CLang'],['-g',234.59,119.59],['-O2',273.02,178.22],['-O3',276.87,183.78],]);var options={title:"Compilation Time - Less is better",animation:{duration:1200,easing:"in"},width:'400px',height:'300px',hAxis:{title:"Options"},vAxis:{title:"Seconds",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_0');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The most interesting fact in these results is that CLang is much faster than GCC. It takes twice less times to compile eddic with CLang in debug mode than with GCC. The impact on optimizations on CLang's compilation is also more important than on GCC. For both compilers, -O3 does not seems to add a lot of overhead.&lt;/p&gt;
&lt;h3&gt;Runtime performance&lt;/h3&gt;

&lt;p&gt;Then, I tested the performance of the generated executable. I tested it on three things, the whole test suite and two test cases that I know are the slowest for the EDDI Compiler. For each case, I took the slowest value of 5 consecutive executions.&lt;/p&gt;
&lt;div id="graph_1" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_1" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_1(){var graph=new google.visualization.ColumnChart(document.getElementById('graph_1'));var data=google.visualization.arrayToDataTable([['Compiler','GCC -O2','GCC -O3','CLang -O2','CLang -O3'],['testsuite',6.58,6.59,6.74,6.58],['assembly',1.2,1.2,1.2,1.2],['linked_list',0.51,0.5,0.49,0.49],]);var options={title:"Runtime Performance - Less is better",animation:{duration:1200,easing:"in"},width:'600px',height:'400px',hAxis:{title:"Options"},vAxis:{title:"Seconds",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_1');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The difference are very small. In -02, GCC performs a bit better, but in -O3, the performance are equivalent. I was a bit disappointed by the results, because I thought that there would be higher differences. It seems that CLang is not as far from GCC that some people would like to say. It also certainly depends on the program being compiled.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;It is clear that CLang is much faster than GCC to compile eddic. Moreover, the performance of the generated executable are almost similar.&lt;/p&gt;
&lt;p&gt;I will continue to use CLang as my development compiler and switches between the two when I'm doing performance benchmarking. I will try to update the benchmark once new versions of GCC / CLang are available.&lt;/p&gt;
&lt;script type="text/javascript"&gt;function draw_visualization(){draw_graph_0();draw_graph_1();}google.setOnLoadCallback(draw_visualization);&lt;/script&gt;</description><category>Benchmarks</category><category>clang</category><category>Compilers</category><category>EDDI</category><category>gcc</category><category>Performances</category><guid>https://baptiste-wicht.com/posts/2012/11/gcc-4-7-clang-3-1-eddic.html</guid><pubDate>Mon, 12 Nov 2012 08:28:44 GMT</pubDate></item><item><title>Integer Linear Time Sorting Algorithms</title><link>https://baptiste-wicht.com/posts/2012/11/integer-linear-time-sorting-algorithms.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: The code is now more C++&lt;/p&gt;
&lt;p&gt;Most of the sorting algorithms that are used are generally comparison sort. It means that each element of the collection being sorted will be compared to see which one is the first one. A comparison must have a lower bound of Ω(n log n) comparisons. That is why there are no comparison-based sorting algorithm better than O(n log n).&lt;/p&gt;
&lt;p&gt;On the other hand, there are also sorting algorithms that are performing better. This is the family of the integer sorting algorithms. These algorithms are using properties of integer to sort them without comparing them. They can be only be used to sort integers. Nevertheless, a hash function can be used to assign a unique integer to any value and so sort any value. All these algorithms are using extra space. There are several of these algorithms. In this article, we will see three of them and I will present an implementation in C++. At the end of the article, I will compare them to &lt;em&gt;std::sort&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;In the article, I will use &lt;em&gt;n&lt;/em&gt; as the size of the array to sort and &lt;em&gt;m&lt;/em&gt; as the max number that is permitted in the array.&lt;/p&gt;
&lt;h3&gt;Bin Sort&lt;/h3&gt;

&lt;p&gt;Bin Sort, or Bucket Sort, is a very simple algorithm that partition all the input numbers into a number of buckets. Then, all the buckets are outputted in order in the array, resulting in a sorting array. I decided to implement the simplest case of Bin Sort where each number goes in its own bucket, so there are &lt;em&gt;m&lt;/em&gt; buckets.&lt;/p&gt;
&lt;p&gt;The implementation is pretty straightforward:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;binsort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MAX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SIZE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]].&lt;/span&gt;&lt;span class="n"&gt;push_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]){&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;B is the array of buckets. Each bucket is implemented as a std::vector. The algorithm starts by filling each buckets with the numbers from the input array. Then, it outputs them in order in the array.&lt;/p&gt;
&lt;p&gt;This algorithm works in &lt;em&gt;O(n + m)&lt;/em&gt; and requires &lt;em&gt;O(m)&lt;/em&gt; extra memory. With these properties, it makes a very limited algorithm, because if you don't know the maximum number and you have to use the maximum number of the array type, you will have to allocate for instance 2^32 buckets. That won't be possible.&lt;/p&gt;
&lt;h3&gt;Couting Sort&lt;/h3&gt;

&lt;p&gt;An interesting fact about binsort is that each bucket contains only the same numbers. The size of the bucket would be enough. That is exactly what Counting Sort. It counts the number of times an element is present instead of the elements themselves. I will present two versions. The first one is a version using a secondary array and then copying again into the input array and the second one is an in-place sort.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;counting_sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SIZE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SIZE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]];&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SIZE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]];&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SIZE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The algorithm is also simple. It starts by counting the number of elements in each bucket. Then, it aggregates the number by summing them to obtain the position of the element in the final sorted array. Then, all the elements are copied in the temporary array. Finally, the temporary array is copied in the final array. This algorithms works in &lt;em&gt;O(m + n)&lt;/em&gt; and requires &lt;em&gt;O(m + n)&lt;/em&gt;. This version is presented only because it is present in the literature. We can do much better by avoiding the temporary array and optimizing it a bit:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;in_place_counting_sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MAX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SIZE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]];&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The temporary array is removed and the elements are directly written in the sorted array. The counts are not used directly as position, so there is no need to sum them. This version still works in &lt;em&gt;O(m + n)&lt;/em&gt; but requires only &lt;em&gt;O(m)&lt;/em&gt; extra memory. It is much faster than the previous version.&lt;/p&gt;
&lt;h3&gt;Radix Sort&lt;/h3&gt;

&lt;p&gt;The last version that I will discuss here is a Radix Sort. This algorithm sorts the number digit after digit in a specific radix. It is a form of bucket sort, where there is a bucket by digit. Like Counting Sort, only the counts are necessary. For example, if you use radix sort in base 10. It will first sort all the numbers by their first digit, then the second, .... It can work in any base and that is its force. With a well chosen base, it can be very powerful. Here, we will focus on radix that are in the form 2^r. These radix have good properties, we can use shifts and mask to perform division and modulo, making the algorithm much faster.&lt;/p&gt;
&lt;p&gt;The implementation is a bit more complex than the other implementations:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;digits&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="c1"&gt;//Digits&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;                 &lt;/span&gt;&lt;span class="c1"&gt;//Bits&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;radix&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;//Bins&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;radix&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;radix_sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SIZE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cnt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;radix&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shift&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;digits&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shift&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;radix&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;cnt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SIZE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;cnt&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shift&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;radix&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;cnt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cnt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SIZE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;cnt&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shift&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SIZE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;r&lt;/em&gt; indicates the power of two used as the radix (2^r). The mask is used to compute modulo faster. The algorithm repeats the steps for each digit. Here &lt;em&gt;digits&lt;/em&gt; equals 2. It means that we support 2^32 values. A 32 bits value is sorted in two pass. The steps are very similar to counting sort. Each value of the digit is counted and then the counts are summed to give the position of the number. Finally, the numbers are put in order in the temporary array and copied into A.&lt;/p&gt;
&lt;p&gt;This algorithm works in &lt;em&gt;O(digits (m + radix))&lt;/em&gt; and requires &lt;em&gt;O(n + radix)&lt;/em&gt; extra memory. A very good thing is that the algorithm does not require space based on the maximum value, only based on the radix.&lt;/p&gt;
&lt;h3&gt;Results&lt;/h3&gt;

&lt;p&gt;It's time to compare the different implementations in terms of runtime. For each size, each version is tested 25 times on different random arrays. The arrays are the same for each algorithm. The number is the time necessary to sort the 25 arrays. The benchmark has been compiler with GCC 4.7.&lt;/p&gt;
&lt;p&gt;The first test is made with very few duplicates (m = 10n).&lt;/p&gt;
&lt;div id="graph_0" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_0" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_0(){var graph=new google.visualization.ColumnChart(document.getElementById('graph_0'));var data=google.visualization.arrayToDataTable([['x','std::sort','counting_sort','in_place_counting_sort','bin_sort','radix_sort'],['100000',171,182,105,945,89],['500000',993,2229,970,6435,461],['1000000',2175,4812,2046,14096,1068],['5000000',11791,27050,10202,81255,6148],]);var options={title:"m = 10n",animation:{duration:1200,easing:"in"},width:'600px',height:'400px',hAxis:{title:"n"},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_0');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;Radix Sort comes to be the fastest in this case, &lt;strong&gt;twice faster as &lt;em&gt;std::sort&lt;/em&gt;&lt;/strong&gt;. In place counting sort has almost the same performance as &lt;em&gt;std::sort&lt;/em&gt;. The other are performing worse.&lt;/p&gt;
&lt;p&gt;The second test is made with few duplicates (m ~= n).&lt;/p&gt;
&lt;div id="graph_1" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_1" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_1(){var graph=new google.visualization.ColumnChart(document.getElementById('graph_1'));var data=google.visualization.arrayToDataTable([['x','std::sort','counting_sort','in_place_counting_sort','bin_sort','radix_sort'],['100000',186,73,37,309,90],['500000',991,611,189,3126,455],['1000000',2235,2171,547,7978,1038],['5000000',12184,18470,4516,49056,5791],]);var options={title:"m ~= n",animation:{duration:1200,easing:"in"},width:'600px',height:'400px',hAxis:{title:"n"},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_1');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;The numbers are impressive. In place &lt;strong&gt;counting sort is between 3-4 times faster than &lt;em&gt;std::sort&lt;/em&gt;&lt;/strong&gt; and &lt;strong&gt;radix sort is twice faster than &lt;em&gt;std::sort&lt;/em&gt;&lt;/strong&gt; ! Bin Sort does not performs very well and counting sort even if generally faster than &lt;em&gt;std::sort&lt;/em&gt; does not scale very well.&lt;/p&gt;
&lt;p&gt;Let's test with more duplicates (m = n / 2).&lt;/p&gt;
&lt;div id="graph_2" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_2" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_2(){var graph=new google.visualization.ColumnChart(document.getElementById('graph_2'));var data=google.visualization.arrayToDataTable([['x','std::sort','counting_sort','in_place_counting_sort','bin_sort','radix_sort'],['100000',178,65,25,262,90],['500000',979,450,143,2332,461],['1000000',2171,1480,321,6240,1041],['5000000',11978,16205,3453,41709,5890],]);var options={title:"m = n / 2",animation:{duration:1200,easing:"in"},width:'600px',height:'400px',hAxis:{title:"n"},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_2');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;&lt;em&gt;std::sort&lt;/em&gt; and radix sort performance does not change a lot but the other sort are performing better. In-place counting sort is still the leader with a higher margin.&lt;/p&gt;
&lt;p&gt;Finally, with a lot of duplicates (m = n / 10).&lt;/p&gt;
&lt;div id="graph_3" style="width: 600px; height: 400px;"&gt;&lt;/div&gt;
&lt;p&gt;&lt;input id="button_graph_3" type="button" value="Logarithmic scale"&gt;
&lt;script type="text/javascript"&gt;function draw_graph_3(){var graph=new google.visualization.ColumnChart(document.getElementById('graph_3'));var data=google.visualization.arrayToDataTable([['x','std::sort','counting_sort','in_place_counting_sort','bin_sort','radix_sort'],['100000',161,46,12,144,74],['500000',918,322,76,1023,449],['1000000',2062,824,167,2721,1041],['5000000',10789,8534,1030,24026,5686],]);var options={title:"m = n / 10n",animation:{duration:1200,easing:"in"},width:'600px',height:'400px',hAxis:{title:"n"},vAxis:{title:"ms",viewWindow:{min:0}}};graph.draw(data,options);var button=document.getElementById('button_graph_3');button.onclick=function(){if(options.vAxis.logScale){button.value="Logarithmic Scale";}else{button.value="Normal scale";}options.vAxis.logScale=!options.vAxis.logScale;graph.draw(data,options);};}&lt;/script&gt;
&lt;/p&gt;
&lt;p&gt;Again, &lt;em&gt;std::sort&lt;/em&gt; and radix sort performance are stable, but in-place counting is now &lt;strong&gt;ten times faster than &lt;em&gt;std::sort&lt;/em&gt;&lt;/strong&gt; !&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;To conclude, we have seen that these algorithms can outperforms &lt;em&gt;std::sort&lt;/em&gt; by a high factor (10 times for In place Counting Sort when there m &amp;lt;&amp;lt; n). If you have to sort integers, you should consider these two cases:&lt;/p&gt;
&lt;ul&gt;
    &lt;li&gt;m &amp;gt; n or m is unknown : Use radix sort that is about twice faster than &lt;em&gt;std::sort&lt;/em&gt;.&lt;/li&gt;
    &lt;li&gt;m &amp;lt;&amp;lt; n : Use in place counting sort that can be much faster than &lt;em&gt;std::sort&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I hope you found this article interesting. The implementation can be found on Github: https://github.com/wichtounet/articles/tree/master/src/linear_sorting&lt;/p&gt;
&lt;script type="text/javascript"&gt;function draw_visualization(){draw_graph_0();draw_graph_1();draw_graph_2();draw_graph_3();}google.setOnLoadCallback(draw_visualization);&lt;/script&gt;</description><category>Algorithm</category><category>Benchmarks</category><category>C++</category><category>Performances</category><guid>https://baptiste-wicht.com/posts/2012/11/integer-linear-time-sorting-algorithms.html</guid><pubDate>Wed, 07 Nov 2012 08:02:46 GMT</pubDate></item><item><title>Run your Boost Tests in parallel with CMake</title><link>https://baptiste-wicht.com/posts/2012/10/run-boost-test-parallel-cmake.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;I was looking for a Test Library to run eddic tests in parallel to replace Boost Test Library. I posted my question on StackOverflow and an awesome solution has been posted. With CMake and a little CMake additional file, it is possible to run the tests written with Boost Test Library in parallel without changing anything in the tests code !&lt;/p&gt;
&lt;p&gt;CTest is the test runner that is shipped with CMake. This runner can run tests in parallel using the -j X option (X is the numbers of threads). However, it can only run the tests that are declared in the CMakeLists.txt file. In my case, this means only one (the executable with Boost Test Library). If you have T tests, a solution would be create T executable files. Then, they can be run in parallel by ctest. However, this is not very practical. The solution proposed in this article is better. &lt;/p&gt;
&lt;h3&gt;Integrate Boost Test Library in CMake&lt;/h3&gt;

&lt;p&gt;Ryan Pavlik provides a series of CMake modules in its Github repository. One of this module is named BoostTestTargets. It automatically generates the CTest commands to run all the tests that you have. The small drawback is that you to list all the tests. &lt;/p&gt;
&lt;p&gt;To start, you have to download these files: &lt;/p&gt;
&lt;ul&gt;
    &lt;li&gt;&lt;a href="https://github.com/rpavlik/cmake-modules/raw/master/BoostTestTargets.cmake" title="BoostTestTargets.cmake"&gt;BoostTestTargets.cmake&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href="https://github.com/rpavlik/cmake-modules/raw/master/GetForceIncludeDefinitions.cmake" title="GetForceIncludeDefinitions.cmake"&gt;GetForceIncludeDefinitions.cmake&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href="https://github.com/rpavlik/cmake-modules/raw/master/CopyResourcesToBuildTree.cmake" title="CopyResourcesToBuildTree.cmake"&gt;CopyResourcesToBuildTree.cmake&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href="https://github.com/rpavlik/cmake-modules/blob/master/BoostTestTargetsStatic.h" title="BoostTestTargetsStatic.h"&gt;BoostTestTargetsStatic.h&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href="https://github.com/rpavlik/cmake-modules/blob/master/BoostTestTargetsDynamic.h" title="BoostTestTargetsDynamic.h"&gt;BoostTestTargetsDynamic.h&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href="https://github.com/rpavlik/cmake-modules/blob/master/BoostTestTargetsIncluded.h" title="BoostTestTargetsIncluded.h"&gt;BoostTestTargetsIncluded.h&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These files must be placed next to your CMakeLists.txt file. Then, you have to modify your CMakeLists.txt file to enable testing and enable the new module. For example, if you have two test suites and five tests in each:  &lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nb"&gt;INCLUDE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;CTest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;ENABLE_TESTING&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;GLOB_RECURSE&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;test_files&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;test/*&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;include&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;BoostTestTargets.cmake&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;add_boost_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;eddic_boost_test&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;SOURCES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;test_files&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;TESTS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;TestSuiteA/test_1&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;TestSuiteA/test_2&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;TestSuiteA/test_3&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;TestSuiteA/test_4&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;TestSuiteA/test_5&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;TestSuiteB/test_1&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;TestSuiteB/test_2&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;TestSuiteB/test_3&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;TestSuiteB/test_4&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;TestSuiteB/test_5&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;All the test files are searched in the test directory and used in the SOURCES variable. Then all the tests are declared. &lt;/p&gt;
&lt;p&gt;The main test file has to include a specific header file:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#define BOOST_TEST_MODULE eddic_test_suite&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;BoostTestTargetConfig.h&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This file will be automatically detected by BoostTestTargets and configured correctly. And that's it !&lt;/p&gt;
&lt;p&gt;You can run CMake again in your build directory to use the new test system: &lt;/p&gt;
&lt;p&gt;[bash]cmake .[/bash]&lt;/p&gt;
&lt;p&gt;If the configuration has been successful, you will see a message indicating that. For example, I see that: &lt;/p&gt;
&lt;pre&gt;-- Test 'eddic_boost_test' uses the CMake-configurable form of the boost test framework - congrats! (Including File: /home/wichtounet/dev/eddi/eddic/test/IntegrationTests.cpp)
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/ramdrive/dev/eddic&lt;/pre&gt;

&lt;h3&gt;Run tests in parallel&lt;/h3&gt;

&lt;p&gt;You can then run your tests in parallel with ctest. For instance, with 9 threads: &lt;/p&gt;
&lt;pre&gt;ctest -j 8&lt;/pre&gt;

&lt;p&gt;In my case, my tests are completed 6x faster ! This is very valuable when you often run your tests. &lt;/p&gt;
&lt;p&gt;For more information on how to integrate your Boost Test Library tests with CMake, you can consult the &lt;a href="https://github.com/rpavlik/cmake-modules/" title="cmake-modules Github repository"&gt;The cmake-modules repository&lt;/a&gt;&lt;/p&gt;</description><category>Boost</category><category>C++</category><category>cmake</category><category>Concurrency</category><category>EDDI</category><category>Performances</category><category>Tests</category><guid>https://baptiste-wicht.com/posts/2012/10/run-boost-test-parallel-cmake.html</guid><pubDate>Mon, 15 Oct 2012 06:57:43 GMT</pubDate></item><item><title>Algorithms books Reviews</title><link>https://baptiste-wicht.com/posts/2012/08/algorithms-books-reviews.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;To be sure to be well prepared for an interview, I decided to read several &lt;strong&gt;Algorithms book&lt;/strong&gt;. I also chosen books in order to have information about data structures. I chose these books to read:&lt;/p&gt;
&lt;ol&gt;
    &lt;li&gt;Data Structures &amp;amp; Algorithm Analysis in C++, Third Edition, by Clifford A. Shaffer&lt;/li&gt;
    &lt;li&gt;Algorithms in a Nutshell, by George T. Heineman, Gary Pollice and Stanley Selkow&lt;/li&gt;
    &lt;li&gt;Algorithms, Fourth Edition, by Robert Sedgewick and Kevin Wayne&lt;/li&gt;
    &lt;li&gt;Introduction to Algorithms, by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. I have to say that I have only read most of it, not completely, because some chapters were not interesting for me at the current time, but I will certainly read them later.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As some of my comments are about the presentation of the books, it has to be noted that I have read the three first books on my Kindle.&lt;/p&gt;
&lt;p&gt;In this post, you will find my point of view about all these books.&lt;/p&gt;
&lt;h4&gt;Data Structures &amp;amp; Algorithm Analysis in C++&lt;/h4&gt;

&lt;p&gt;This book is really great. It contains a lot of data structures and algorithms. Each of them is very clearly presented. It is not hard to understand the data structures and the algorithms.&lt;/p&gt;
&lt;p&gt;Each data structure is first presented as an ADT (Abstract Data Structure) and then several possible implementations are presented. Each implementation is precisely defined and analyzed to find its sweet pots and worst cases.  Other implementations are also presented with enough references to know where to start with them.&lt;/p&gt;
&lt;p&gt;I have found that some other books about algorithms are writing too much stuff for a single thing. This is not the case with this book. Indeed, each interesting thing is clearly and succinctly explained.&lt;/p&gt;
&lt;p&gt;About the presentation, the code is well presented and the content of the book is very well written. A good think would have been to add a summary of the most important facts about each algorithm and data structure. If you want to know these facts, you have to read several pages (but the facts are always here).&lt;/p&gt;
&lt;p&gt;The book contains very good explanation about the complexity analysis of algorihtms. It also contains a very interesting chapter about limits to computation where it treats P, NP, NP-Complete and NP-Hard complexity classes.&lt;/p&gt;
&lt;p&gt;This book contains a large number of exercises and projects that can be used to improve even more your algorithmic skills. Moreover, there are very good references at the end of each chapters if you want more documentation about a specific subject.&lt;/p&gt;
&lt;p&gt;I had some difficulty reading it on my Kindle. Indeed, it's impossible to switch chapters directly with the Kindle button. If you want quick access to the next chapter, you have to use the table of contents.&lt;/p&gt;
&lt;h4&gt;Algorithms in a Nutshell&lt;/h4&gt;

&lt;p&gt;This book is much shorter than the previous one. Even if it could be a good book for beginners, I didn't liked this book a lot. The explanations are a bit messy sometimes and it could contain more data structures (even if I know that this is not the subject of the book). The analysis of the different algorithms are a bit short too. Even if it looks normal for a book that short, it has to be known that this book has no exercise.&lt;/p&gt;
&lt;p&gt;However, this book has also several good points. Each algorithm is very well presented in a single panel. The complexity of each algorithm is directly given alongside its code. It helps finding quickly an algorithm and its main properties.&lt;/p&gt;
&lt;p&gt;Another thing that I found good is that the author included empiric benchmarks as well as complexity analysis. The chapters about Path Finding in AI and computational geometry were very interesting, especially because it is not widely dealt with in other books.&lt;/p&gt;
&lt;p&gt;It also has very good references for each chapter.&lt;/p&gt;
&lt;p&gt;This book was perfect to read with Kindle, the navigation was very easy.&lt;/p&gt;
&lt;h4&gt;Algorithms&lt;/h4&gt;

&lt;p&gt;This book is a good book, but suffers from several drawbacks regarding to other books. First, the book covers a lot of data structures and algorithms. Then, it also has very good explanations about complexity classes. It also has a lot of exercises. I also liked a lot the chapter about string algorithms that was lacking in previous books.&lt;/p&gt;
&lt;p&gt;Most of the time, the explanations are good, but sometimes, I found them quite hard to understand. Moreover, some parts of code are also hard to follow. The author included Java runs of some of programs. In my opinion, this is quite useless, empiric benchmarks could have been useful, but not single runs of the program. Some of the diagrams were also hard to read, but that's perhaps a consequence of the Kindle.&lt;/p&gt;
&lt;p&gt;A think that disappointed me a bit is that the author doesn't use big Oh notation. Even, if we have enough information to easily get the Big Oh equivalent, I don't understand why a book about algorithms doesn't use this notation.&lt;/p&gt;
&lt;p&gt;Just like the first book, there is no simple view of a given algorithm that contains all the information about an algorithm. Another think that disturbed me is that the author takes time to describe an API around the algorithms and data structures and about the Java API. Again, in my opinion only, it takes a too large portion of the book.&lt;/p&gt;
&lt;p&gt;Again, this book was perfect to read with Kindle, the navigation was very easy.&lt;/p&gt;
&lt;h4&gt;Introduction to Algorithms&lt;/h4&gt;

&lt;p&gt;This book is the most complete I read about algorithms and data structures by a large factor. It has very complete explanations about complexity analysis: big Oh, Big Theta, Small O. For each data structure and algorithm, the complexity analysis is very detailed and very well explained. The pieces of code are written in a very good pseudo code manner.&lt;/p&gt;
&lt;p&gt;As I said before, the complexity analysis are very complete and sometimes very complex. This can be either an advantage or a disadvantage, depending of what you awaits from the book. For example, the analysis is made using several notations Big Oh, Big Theta or even small Oh. Sometimes, it is a bit hard to follow, but it provides very good basis for complexity analysis in general.&lt;/p&gt;
&lt;p&gt;The book  was also the one with the best explanations about linear time sorting algorithms. In the other books, I found difficult to understand sorts like counting sort or bucket sort, but in this book, the explanations are very clear. It also includes multithreaded algorithm analysis, number theoretic algorithms, polynomials and a very complete chapter about linear programming.&lt;/p&gt;
&lt;p&gt;The book contains a huge number of exercises for each chapters and sub chapters.&lt;/p&gt;
&lt;p&gt;This book will not only help you find the best suited algorithm for a given problem, it will also help you understand how to write your own algorithm for a problem or how to analyze deeply an existing solution.&lt;/p&gt;
&lt;h4&gt;Algorithms Book Wrap-up&lt;/h4&gt;

&lt;p&gt;As I read all these Algorithms books in order, it's possible that my review is a bit subjective regarding to comparisons to other books.&lt;/p&gt;
&lt;p&gt;If you plan to work in C++ and need more knowledge in algorithms and C++, I advice you to read &lt;strong&gt;Data Structures &amp;amp; Algorithm Analysis in C++&lt;/strong&gt;, that is really awesome. If you want a very deep knowledge about algorithm analysis and algorithms in general and have good mathematical basis, you should really take a deep look at &lt;strong&gt;Introduction to Algorithms&lt;/strong&gt;. If you want short introduction about algorithms and don't care about the implementation language, you can read &lt;strong&gt;Algorithms in a Nutshell&lt;/strong&gt;. &lt;strong&gt;Algorithms&lt;/strong&gt; is like a master key, it will gives you good starting knowledge about algorithm analysis and a broad range of algorithms and data structures.&lt;/p&gt;</description><category>Algorithm</category><category>Books</category><category>C++</category><category>Conception</category><category>Java</category><category>Performances</category><category>Programming</category><guid>https://baptiste-wicht.com/posts/2012/08/algorithms-books-reviews.html</guid><pubDate>Fri, 24 Aug 2012 06:52:04 GMT</pubDate></item><item><title>C++11 Synchronization Benchmark</title><link>https://baptiste-wicht.com/posts/2012/07/c11-synchronization-benchmark.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;In the previous parts of this serie, we saw some C++11 Synchronization techniques: locks, lock guards and atomic references.&lt;/p&gt;
&lt;p&gt;In this small post, I will present the results of a little benchmark I did run to compare the different techniques. In this benchmark, the critical section is a single increment to an integer. The critical section is protected using three techniques:&lt;/p&gt;
&lt;ul&gt;
    &lt;li&gt;A single std::mutex with calls to lock() and unlock()&lt;/li&gt;
    &lt;li&gt;A single std::mutex locked with std::lock_guard&lt;/li&gt;
    &lt;li&gt;An atomic reference on the integer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tests have been made with 1, 2, 4, 8, 16, 32, 64 and 128 threads. Each test is repeated 5 times.&lt;/p&gt;
&lt;p&gt;The results are presented in the following figure:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.baptiste-wicht.com/2012/07/c11-synchronization-benchmark/synchronization_cpp_benchmarks/" rel="attachment wp-att-2071"&gt;&lt;img class=" wp-image-2071  " title="C++11 Synchronization Benchmark Result" src="https://baptiste-wicht.com/wp-content/uploads/2012/07/synchronization_cpp_benchmarks-300x230.png" alt="C++11 Synchronization Benchmark Result" width="300" height="230"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As expected, the mutex versions are much slower than the atomic one. An interesting point is that the the atomic version has not a very good scalability. I would have expected that the impact of adding one thread would not be that high.&lt;/p&gt;
&lt;p&gt;I'm also surprised that the lock guard version has a non-negligible overhead when there are few threads.&lt;/p&gt;
&lt;p&gt;In conclusion, do not locks when all you need is modifying integral types. For that, std::atomic is much faster. Good Lock-Free algorithms are almost always faster than the algorithms with lock.&lt;/p&gt;
&lt;p&gt;The sources of the benchmark are available on Github: &lt;a href="https://github.com/wichtounet/articles/tree/master/src/threads/benchmark"&gt;https://github.com/wichtounet/articles/tree/master/src/threads/benchmark&lt;/a&gt;&lt;/p&gt;</description><category>Benchmarks</category><category>C++</category><category>C++11 Concurrency Tutorial</category><category>Concurrency</category><category>Performances</category><guid>https://baptiste-wicht.com/posts/2012/07/c11-synchronization-benchmark.html</guid><pubDate>Thu, 26 Jul 2012 06:47:59 GMT</pubDate></item><item><title>C++11 Concurrency Tutorial - Part 4: Atomic Types</title><link>https://baptiste-wicht.com/posts/2012/07/c11-concurrency-tutorial-part-4-atomic-type.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;In the previous article, we saw advanced techniques about mutexes. In this post, we will continue to work on mutexes with more advanced techniques. We will also study another concurrency technique of the C++11 Concurrency Library: Atomic Types&lt;/p&gt;
&lt;h4&gt;Atomic Types&lt;/h4&gt;

&lt;p&gt;We will take, the example of a Counter:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;decrement&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We already saw that this class was not safe at all to use in multithreaded environment. We also saw how to make if safe using mutexes. This time, we will see how to make it safe using atomic types. The main advantage of this technique is its performance. Indeed, in most cases, the std::atomic operations are implemented with lock-free operations that are much faster than locks. &lt;/p&gt;
&lt;p&gt;The C++11 Concurrency Library introduces Atomic Types as a template class: std::atomic&lt;type&gt;. You can use any Type you want with that template and the operations on that variable will be atomic and so thread-safe. It has to be taken into account that it is up to the library implementation to choose which syncronization mechanism is used to make the operations on that type atomic. On standard platforms for integral types like int, long, float, ... it will be some lock-free technique. If you want to make a big type (let's saw 2MB storage), you can use std::atomic as well, but mutexes will be used. In this case, there will be no performance advantage. &lt;/type&gt;&lt;/p&gt;
&lt;p&gt;The main functions that std::atomic provide are the store and load functions that atomically set and get the contents of the std::atomic. Another interesting function is exchange, that sets the atomic to a new value and returns the value held previously. Finally, there are also two functions compare_exchange_weak and compare_exchance_strong that performs atomic exchange only if the value is equal to the provided expected value. These two last functions can be used to implement lock-free algorithms. &lt;/p&gt;
&lt;p&gt;std::atomic is specialized for all integral types to provide member functions specific to integral (like operators ++, --, fetch_add, fetch_sub, ...). &lt;/p&gt;
&lt;p&gt;It is fairly easy to make  the counter safe with std::atomic: &lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;atomic&amp;gt;&lt;/span&gt;

&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;AtomicCounter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;atomic&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;decrement&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If you test this counter, you will see that the value is always the expected one. &lt;/p&gt;
&lt;h4&gt;Wrap-Up&lt;/h4&gt;

&lt;p&gt;In this article we saw a very elegant technique to perform atomic operations on any type. I advice you to use std::atomic any time you need to make atomic operations on a type, especially integral types. &lt;/p&gt;
&lt;p&gt;The source code for this article can be found &lt;a href="https://github.com/wichtounet/articles/tree/master/src/threads/part4" title="Source of this article"&gt;on Github&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;Next&lt;/h4&gt;

&lt;p&gt;In the next post of this series, we will see how to use the Futures facility to perform asynchronous task.&lt;/p&gt;</description><category>C++</category><category>C++11 Concurrency Tutorial</category><category>Concurrency</category><category>Performances</category><guid>https://baptiste-wicht.com/posts/2012/07/c11-concurrency-tutorial-part-4-atomic-type.html</guid><pubDate>Mon, 16 Jul 2012 07:22:04 GMT</pubDate></item><item><title>C++11 Concurrency Tutorial - Part 2 : Protect shared data</title><link>https://baptiste-wicht.com/posts/2012/03/cp11-concurrency-tutorial-part-2-protect-shared-data.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;In the previous article, we saw how to start threads to execute some code in parallel. All the code executed in the threads were independant. In the general case, you often use shared objects between the threads. And when you do it, you will face another problem: synchronization. &lt;/p&gt;
&lt;p&gt;We will see what is this problem in a simple code. &lt;/p&gt;
&lt;h4&gt;Synchronization issues&lt;/h4&gt;

&lt;p&gt;As an example, we will take a simple Counter structure. This structure has a value and methods to increment or decrement the value. Here is the structure:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;){}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;There is nothing new here. Now, let's start some threads and make some increments: &lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kr"&gt;thread&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;push_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kr"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;](){&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kr"&gt;thread&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kr"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;endl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Again, nothing new there. We launch 5 threads and each one increment the counter hundred times. After all thread have finished their work, we print the value of the counter. &lt;/p&gt;
&lt;p&gt;If we launch this program, we should expect that it will print 500. But this is not the case. No one can say what this program will print. Here are some results I obtained on my computer: &lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="mf"&gt;442&lt;/span&gt;
&lt;span class="mf"&gt;500&lt;/span&gt;
&lt;span class="mf"&gt;477&lt;/span&gt;
&lt;span class="mf"&gt;400&lt;/span&gt;
&lt;span class="mf"&gt;422&lt;/span&gt;
&lt;span class="mf"&gt;487&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The problem is that the incrementation is not an atomic operation. As a matter of fact, an incrementation is made of three operations: &lt;/p&gt;
&lt;ul&gt;
    &lt;li&gt;Read the current value of &lt;em&gt;value&lt;/em&gt;&lt;/li&gt;
    &lt;li&gt;Add one to the current value&lt;/li&gt;
    &lt;li&gt;Write that new value to &lt;em&gt;value&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you run that code using a single thread, there are no problems. It will execute each part of the  operation one after another. But when you have several threads, you can start having troubles. Imagine this situation:&lt;/p&gt;
&lt;ol&gt;
    &lt;li&gt;Thread 1 : read the value, get 0, add 1, so value = 1&lt;/li&gt;
    &lt;li&gt;Thread 2 : read the value, get 0, add 1, so value = 1&lt;/li&gt;
    &lt;li&gt;Thread 1 : write 1 to the field value and return 1&lt;/li&gt;
    &lt;li&gt;Thread 2 : write 1 to the field value and return 1&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These situations come from what we call interleaving. Interleaving describe the possible situations of several threads executing some statements. Even for three operations and two threads, there is a lot of possible interleavings. When you have more threads and more operations, it is almost impossible to enumerate the possibles interleavings. The problem can also occurs when a thread gets preempted between instructions of the operation. &lt;/p&gt;
&lt;p&gt;There are several solutions to fix this problem: &lt;/p&gt;
&lt;ul&gt;
    &lt;li&gt;Semaphores&lt;/li&gt;
    &lt;li&gt;Atomic references&lt;/li&gt;
    &lt;li&gt;Monitors&lt;/li&gt;
    &lt;li&gt;Condition codes&lt;/li&gt;
    &lt;li&gt;Compare and swap&lt;/li&gt;
    &lt;li&gt;etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this blog post we will learn how to use semaphores to fix this problem. As a matter of fact, we will a special kind of semaphores called mutexes. A mutex is a very simple object. Only one thread can obtain the lock on a mutex at the same time. This simple (and powerful) property of a mutex allow us to use it to fix synchronization problems. &lt;/p&gt;
&lt;h4&gt;Use a mutex to make our Counter thread-safe&lt;/h4&gt;

&lt;p&gt;In the C++11 threading library, the mutexes are in the mutex header and the class representing a mutex is the std::mutex class. There are two important methods on a mutex: lock() and unlock(). As their names indicate, the first one enable a thread to obtain the lock and the second releases the lock. The lock() method is blocking. The thread will only return from the lock() method when the lock has been obtained. &lt;/p&gt;
&lt;p&gt;To make our Counter struct thread-safe, we have to add a std::mutex member to it and then to lock()/unlock() the mutex in every function of the object: &lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unlock&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If we now test this implementation with the same code as before for starting the threads, the program will always display 500. &lt;/p&gt;
&lt;h4&gt;Exceptions and locks&lt;/h4&gt;

&lt;p&gt;Now, let's see what happens in another case. Imagine that the Counter has a decrement operation that throws an exception if the value is 0: &lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;decrement&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="k"&gt;throw&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"Value cannot be less than 0"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You want to access this structure concurrently without modifying the class. So you create a wrapper with locks for this class: &lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;ConcurrentCounter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unlock&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;decrement&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decrement&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unlock&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This wrapper works well in most of the cases, but when an exception occurs in the decrement method, you have a big problem. Indeed, if an exception occurs, the unlock() function is not called and so the lock is left in a blocked state. Then, you program is completely blocked. To fix this problem, you have to use a try/catch structure to unlock the lock before throwing again the exception:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;decrement&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decrement&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unlock&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;throw&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unlock&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The code is not difficult but starts looking ugly. Now imagine you are in a function with 10 different exit points. You will have to call unlock() from each of these points and the probability that you will forget one is big. Even bigger is the risk that you won't add a call to unlock when you add a new exit point to a function. &lt;/p&gt;
&lt;p&gt;The next section gives a very nice solution to this problem.&lt;/p&gt;
&lt;h4&gt;Automatic management of locks&lt;/h4&gt;

&lt;p&gt;When you want to protect a whole block of code (a function in our case, but can be inside a loop or another control structure), it exists a good solution to avoid forgetting to release the lock: std::lock_guard. &lt;/p&gt;
&lt;p&gt;This class is a simple smart manager for a lock. When the std::lock_guard is created, it automatically calls lock() on the mutex. When the guard gets destructed, it also releases the lock. You can use it like this: &lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;ConcurrentSafeCounter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;lock_guard&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;guard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;decrement&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;lock_guard&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;guard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decrement&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Much nicer, isn't it :)&lt;/p&gt;
&lt;p&gt;With that solution, you do not have to handle all the cases of exit of the function, they are all handled by the destructor of the std::lock_guard instance. &lt;/p&gt;
&lt;h4&gt;Conclusion&lt;/h4&gt;

&lt;p&gt;We are now done with semaphores. In this article, you learned how to protect shared data using mutexes from the C++ Threads Library. &lt;/p&gt;
&lt;p&gt;Keep in mind that locks are slow. Indeed, when you use locks you make sections of the code sequential. If you want an highly parallel application, there are other solutions than locks that are performing much better but this is out of the scope of this article. &lt;/p&gt;
&lt;h4&gt;Next&lt;/h4&gt;

&lt;p&gt;In the next blog post of this serie, I will talk about advanced concepts for mutexes and how to use condition variables to fix little concurrent programming problem. &lt;/p&gt;
&lt;p&gt;The source code for each sample is available &lt;a title="Source code for this blog post" href="https://github.com/wichtounet/articles/tree/master/src/threads/part2/"&gt;on Github&lt;/a&gt;.&lt;/p&gt;</description><category>C++</category><category>C++11 Concurrency Tutorial</category><category>Concurrency</category><category>Performances</category><guid>https://baptiste-wicht.com/posts/2012/03/cp11-concurrency-tutorial-part-2-protect-shared-data.html</guid><pubDate>Mon, 26 Mar 2012 07:04:28 GMT</pubDate></item><item><title>COJAC, A Numerical Problem Sniffer</title><link>https://baptiste-wicht.com/posts/2012/02/cojac-a-numerical-problem-sniffer.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;During my bachelor at the HES-SO University of applied sciences in Fribourg, I worked on a Java project, &lt;strong&gt;COJAC&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;COJAC is a tool performing &lt;strong&gt;On-the-fly code instrumentation&lt;/strong&gt; to help uncover numerical problems with integer (overflows) and floating point (smearing, cancellation, infinity, NaN) arithmetic.&lt;/p&gt;
&lt;p&gt;Yesterday, Dr Dobbs published an article by one of my professor Frédéric Bapst and myself.&lt;/p&gt;
&lt;p&gt;This article discusses the question of numerical problems in programming, and focuses on the approach of using on-the-fly code instrumentation to uncover them at runtime. Two realizations are presented: a complete and stable solution for Java applications, and a proof-of-concept Valgrind add-on for Linux executables. Both tools require no intervention on the source code and no recompilation, and should be helpful as a diagnostic tool for developers, as well as for education purposes for undergraduate programmers.&lt;/p&gt;
&lt;p&gt;If you are interested by this project, I invite you to read the article on Dr Dobbs : &lt;a title="Project of the Month: Cojac, A Numerical Problem Sniffer" href="http://drdobbs.com/testing/232601564" target="_blank"&gt;Project of the Month: Cojac, A Numerical Problem Sniffer&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You can also test the tool or browse the source code by downloading it on the &lt;a title="COJAC - The numerical problem sniffer for Java" href="https://code.google.com/p/cojac/" target="_blank"&gt;COJAC website&lt;/a&gt;.&lt;/p&gt;</description><category>Compilers</category><category>Java</category><category>Performances</category><guid>https://baptiste-wicht.com/posts/2012/02/cojac-a-numerical-problem-sniffer.html</guid><pubDate>Tue, 28 Feb 2012 12:24:28 GMT</pubDate></item><item><title>Boost intrusive_ptr : faster shared pointer</title><link>https://baptiste-wicht.com/posts/2011/11/boost-intrusive_ptr.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;This post will present the Boost intrusive_ptr and its usage in C++ programming. &lt;/p&gt;
&lt;p&gt;Recently, I took some time to optimize the parsing performances of the EDDI Compiler. The parsing phase creates a lot of nodes to fill the Abstract Syntax Tree. &lt;/p&gt;
&lt;p&gt;One of the way I found was to replace some shared_ptr by some intrusive_ptr of the Boost library. &lt;/p&gt;
&lt;p&gt;It's a faster alternative of shared_ptr. Like its name indicates, it's intrusive. The reference counter is included directely in the managed class, in the contrary of the shared_ptr where the reference counter has to be dynamically allocated to live aside the object. This leads to some performances improvement. Considering memory, the footprint of an intrusive_ptr is the same as the footprint of a raw pointer. This is not the case for the shared_ptr that have a pointer to the object, a pointer to the counter and the counter itself. &lt;/p&gt;
&lt;p&gt;For example, if you have a class X:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;X&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And you use it in your code using a shared_ptr : &lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;shared_ptr&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;endl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;and you want to use an intrusive_ptr, you have first to add a reference counter inside the X class : &lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;X&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;references&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;references&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And you have to indicate to the intrusive_ptr where the reference counter can be found for this class : &lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kr"&gt;inline&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;intrusive_ptr_add_ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;references&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kr"&gt;inline&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;intrusive_ptr_release&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;references&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;delete&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And finally you can use the intrusive_ptr to replace your shared_ptr : &lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;boost&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;intrusive_ptr&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;endl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The smart pointer itself can be used exactly the same way as a shared_ptr. If you have several classes that are managed using an intrusive_ptr, you can use a function template to tell Boost that all the reference counter are at the same place : &lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;template&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;typename&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="kr"&gt;inline&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;intrusive_ptr_add_ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;references&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;template&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;typename&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="kr"&gt;inline&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;intrusive_ptr_release&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;references&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;delete&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;As you can see, the pointer is very intrusive and needs some boilerplate code added to your application, but it can leads to some interesting improvements for classes very often dynamically allocated. &lt;/p&gt;
&lt;p&gt;There is another advantage in using intrusive_ptr. As the reference counter is stored into the object itself, you can create several intrusive_ptr to the same object without any problem. This is not the case when you use a shared_ptr. Indeed, if you create two shared_ptr to the same dynamically allocated object, they will both have a different references counter and at the end, you will end up with an object being deleted twice. &lt;/p&gt;
&lt;p&gt;Of course, there are not only advantages. First of all, you have to declare a field in every classes that you want to manage using an intrusive_ptr and you have to declare functions to manage the reference. Then there are some disadvantages when using this pointer type compared to a shared_ptr : &lt;/p&gt;
&lt;ul&gt;
    &lt;li&gt;It's impossible to create a weak_ptr from a intrusive_ptr&lt;/li&gt;
    &lt;li&gt;Code redundancy, you have to copy the reference counter in every class that you want to use an intrusive_ptr with&lt;/li&gt;
    &lt;li&gt;You have to provide a function for every types that has to be used with intrusive_ptr (only two functions if you use the template versions of the two functions)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To conclude, the boost::intrusive_ptr can be a good replacement of std::shared_ptr in a performance critical application, but if you have no performances problem, do not use it because it makes your code less clear. If you are concerned by performances when using std::shared_ptr, consider also using std::make_shared to create your pointers, so that the reference counter and the object itself will be allocated at the same place and at the same time, resulting in better performances. Another case where it's interesting to use an intrusive_ptr is when dealing with libraries using a lot of raw pointers, because you can create several intrusive_ptr to the same raw pointer without any problem.&lt;/p&gt;
&lt;p&gt;For more information, you can consult &lt;a href="http://www.boost.org/doc/libs/1_47_0/libs/smart_ptr/intrusive_ptr.html" title="Boost intrusive_ptr official documentation"&gt;the official documentation&lt;/a&gt;.&lt;/p&gt;</description><category>Boost</category><category>C++</category><category>Performances</category><guid>https://baptiste-wicht.com/posts/2011/11/boost-intrusive_ptr.html</guid><pubDate>Mon, 14 Nov 2011 07:40:41 GMT</pubDate></item><item><title>eddic 0.5.1 : Better assembly generation and faster parsing</title><link>https://baptiste-wicht.com/posts/2011/11/eddic-0-5-1-better-assembly-generation-and-faster-parsing.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;p&gt;I'm very pleased to release the version 0.5.1 of the EDDI Compiler.&lt;/p&gt;
&lt;p&gt;It makes now a long time since the last version of eddic, but I started again working frequently on it. This version doesn't add any new feature to the language, but there are a lot of improvements in the compiler itself. &lt;/p&gt;
&lt;p&gt;First of all, the generated assembly has been improved a lot. I use now a intermediate representation of assembly and then the compiler is able to make more optimizations in its choice. This optimization is especially visible for integer computations. Before this, all the computations used stack operations and then we use almost only registers when it's possible. It's still not perfect, but it uses way less instructions. Moreover, this can enable me to write a 64 assembly code instead of 32 and provide both versions in the compiler. &lt;/p&gt;
&lt;p&gt;Another improvement is the speed of the parser. I now use Boost Spirit to parse the source file and construct an Abstract Syntax Tree. This parsing is very fast now (with some optimizations). Moreover, it will be easier to add new constructs later. &lt;/p&gt;
&lt;p&gt;I also improved the general performances at some places. I also use Boost Program Options to parse the command line options. &lt;/p&gt;
&lt;p&gt;In the next version (0.6.0), I will introduce arrays of int and strings and the foreach construct for array. I will also remove the dependency to malloc writing a memory allocation manager in assembly. I will also introduce warnings in the compiler. &lt;/p&gt;
&lt;p&gt;You can find the compiler sources on the Github repository : &lt;a title="EDDI COmpiler Repository" href="http://github.com/wichtounet/eddic"&gt;https://github.com/wichtounet/eddic&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;The compiler now needs Boost 1.47.0 to build. &lt;/p&gt;
&lt;p&gt;The exact version I refer to is the v0.5.1 available in the github tags or directly as the release branch.&lt;/p&gt;</description><category>Assembly</category><category>Boost</category><category>C++</category><category>EDDI</category><category>Performances</category><category>Releases</category><category>templates</category><guid>https://baptiste-wicht.com/posts/2011/11/eddic-0-5-1-better-assembly-generation-and-faster-parsing.html</guid><pubDate>Thu, 10 Nov 2011 03:33:32 GMT</pubDate></item><item><title>Diploma Thesis : Inlining Assistance for large-scale object-oriented applications</title><link>https://baptiste-wicht.com/posts/2011/10/diploma-thesis-inlining-assistance-for-large-scale-object-oriented-applications.html</link><dc:creator>Baptiste Wicht</dc:creator><description>&lt;div&gt;&lt;p&gt;One month ago, my diploma thesis has been accepted and I got my Bachelor of Science in Computer Science.&lt;/p&gt;
&lt;p&gt;I made my diploma thesis at Lawrence Berkeley National Laboratory, Berkeley, California. I was in the team responsible of the developmenet of the ATLAS Software for the LHC in Cern. The title of my thesis is &lt;strong&gt;Inlining Assistance for large-scale object-oriented applications&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The goal of this project was to create a C++ analyzer to find the best functions and call sites to inline. The input of the analyzer is a call graph generated by CallGrind of the Valgrind project.&lt;/p&gt;
&lt;p&gt;The functions and call sites to inline are computed using a heuristic, called the temperature. This heuristic is based on the cost of calling the given function, the frequency of calls and the size of the function. The cost of calling a function is based on the number of parameters, the virtuality of the function and the shared object the function is located in.&lt;/p&gt;
&lt;p&gt;The analyzer is also able to find clusters of call sites. A cluster is a set of hot call sites related to each other. It can also finds the functions that should be moved from one library to the other or the function that should not be virtual by testing the use of each function in a class hierarchy.&lt;/p&gt;
&lt;p&gt;To achieve this project, it has been necessary to study in details how a function is called on the Linux platform. The inlining optimization has also been studied to know what were the advantages and the problems of this technique.&lt;/p&gt;
&lt;p&gt;To retrieve the information about the sizes and the virtuality of the function, it has been necessary to read the shared libraries and executables files. For that, we used &lt;em&gt;libelf&lt;/em&gt;. The virtuality of a function is calculated by reading each virtual table and searching for the function in the virtual tables content.&lt;/p&gt;
&lt;p&gt;The graph manipulation is made by the &lt;em&gt;Boost Graph Library&lt;/em&gt;. As it was an advanced library, it has helped me improving my skills in specific topics like templates, traits or Template Metaprogramming.&lt;/p&gt;
&lt;p&gt;The analyzer is able to run on the Linux platform on any program that has been compiled using gcc.&lt;/p&gt;
&lt;p class="more"&gt;&lt;a href="https://baptiste-wicht.com/posts/2011/10/diploma-thesis-inlining-assistance-for-large-scale-object-oriented-applications.html"&gt;Read more…&lt;/a&gt;&lt;/p&gt;&lt;/div&gt;</description><category>Boost</category><category>C++</category><category>Compilers</category><category>gcc</category><category>Linux</category><category>Optimization</category><category>Performances</category><category>Personal</category><guid>https://baptiste-wicht.com/posts/2011/10/diploma-thesis-inlining-assistance-for-large-scale-object-oriented-applications.html</guid><pubDate>Mon, 03 Oct 2011 06:44:17 GMT</pubDate></item></channel></rss>