C++11 Performance tip: When to use std::pow ?
Update: I've added a new section for larger values of n
.
Recently, I've been wondering about the performance of std::pow(x, n)
.
I'm talking here about the case when n
is an integer. In the case when
n
is not an integer, I believe, you should always use std::pow
or use another specialized library.
In case when n is an integer, you can actually replace it with the direct
equivalent (for instance std::pow(x, 3) = x * x x
). If n is very large,
you'd rather write a loop of course ;) In practice, we generally use powers of
two and three much more often than power of 29, although that could happen. Of
course, it especially make sense to wonder about this if the pow is used inside
a loop. If you only use it once outside a loop, that won't be any difference on
the overall performance.
Since I'm mostly interested in single precision performance (neural networks are
only about single precision), the first benchmarks will be using float
.
std::pow performances
So let's see what are the differences between std::pow(x, 2)
and
x * x
. All the code will be compiled in C++11. For the sake of it,
I'll also check the performance of the C pow
function.
Let's start with GCC-6.4 and -O2:
First, there is no difference between C pow function and std::pow, which is
expected. On the other hand, it's interesting to see that there is a definite
difference in performance between x * x
and std::pow(x, 2)
. This
is not a huge difference, but still around 2.5 times slower for
std::pow
.
Let's see if the difference is the same for bigger exponent.
This time the difference is very significant. x * x * x
is two orders
of magnitude faster than std::pow(x, n)
. It seems that the algorithm
used for bigger power is much less efficient. In any case, we can see that this
is not optimized for integer values of exponent values.
Let's see if this changes for a power of 4:
The numbers are almost the same. It seems the algorithm used for approximation of the power does not depend on the exponent itself (at least between 3 and 4).
fast math
One compiler option is very important to consider here: -ffast-math. This compiler option will make some math operations much faster, but will also violate a lot of IEEE compliance. In most cases, it does not matter since this will mostly cover edge cases resulting in infinities and Not-A-Numbers. It also will reduce the accuracy of some operations. If you really care about precise computation, you should not use -ffast-math, but in most case, I think it's fine. Any way, let's see if that changes anything:
For second power, it does not change anything. Let's see about the third power:
For the third power, for std::pow(x, 3)
, it is now much faster than
before. Even though it's still faster to use x * x * x
than
std::pow(x, 3)
, the difference is only around 2.5 times slower.
Clang
I've tested several version of G++ (4.9.4, 5.4.0 and 6.4.0) and I've not seen any significant difference in performance. Let's see if there are differences between clang-3.9 and GCC 6.4.0. Since we are using a function from the library, let's see if using libc++ makes any difference.
As it turns out, there is not much difference between the two compilers, but clang is actually around 15% slower here. Moreover, there is no difference between the two libraries. Let's see if this makes a difference for third power:
This time, the difference between the two compilers is more significant, clang is about 50% slower than GCC on this. Again, there is no significant difference between the two C++ libraries. I was expecting more of a difference between the two, but it seems they are using a similar implementations, if not the same.
double precision
As said earlier, all the tests were run in single precision (float
).
Let's see now if it's any different with double precision (double
).
Again, I'll use G++ 5.4.0 to start with.
Here are the results first without -ffast-math:
This is very interesting! Here there is no overhead of using std::pow
compared to direct multiplication (x * x
). It seems that most of the
overhead of this function for single precision was in fact in conversion to
double since it seems that the algorithm itself is only implemented for double
precision. Let's see about third power now:
As seen before, with third power, the overhead is actually huge. Although this
is slightly faster than when using single precision, it is still 2 orders of
magnitude slower than direct multiplication x * x * x
. Let's see what
happens with -ffast-math:
With -ffast-math, there is absolutely no overhead anymore for std::pow(x, n)
even for third power. The results are the same for clang. I've checked for
higher values of the exponent and the result is also the same.
Bigger exponents
Now, let's try to test for which n
is code:std::pow(x, n)
becoming faster than multiplying in a loop. Since std::pow is using a special
algorithm to perform the computation rather than be simply loop-based
multiplications, there may be a point after which it's more interesting to use
the algorithm rather than a loop.
First, our pow function:
And now, let's see the performance. I've compiled my benchmark with GCC 4.9.3 and running on my old Sandy Bridge processor. Here are the results for 1000 calls to each functions:
We can see that between n=100
and n=110
, std::pow(x, n)
starts to be faster than my_pow(x, n)
. At this point, you should only
use std::pow(x, n)
. Interestingly too, the time for std::pow(x,
n)
is decreasing. Let's see how is the performance with higher range of
n
:
We can see that the pow function time still remains stable while our loop-based
pow function still increases linearly. At n=1000
, std::pow
is
one order of magnitude faster than my_pow
.
Overall, if you do not care much about extreme accuracy, you may consider using
you own pow function for small-ish (integer) n
values. After
n=100
, it becomes more interesting to use std::pow
.
Conclusion
If you are using double precision (double
), std::pow(x, n)
will
be slower than the handcrafted equivalent unless you use -ffast-math, in which
case, there is absolutely no overhead. The overhead without using the compiler
option is quite large, around 2 orders of magnitude, starting from the third
power. With or without -ffast-math, std::pow(x, 2) has no overhead compared to
x * x
.
For single precision, it's another story! For the two compilers that have been
tested and for small integer values of n (but I think it's stays the same for
large integer values of n), it's always faster to use direct multiplication
rather than exponentiation via std::pow(x, n)
. Indeed, it seems that
there is no optimization for the case when n is an integer. When -ffast-math is
used, the difference it not very big, around 2.5 times slower for GCC and around
3.5 times slower for clang. I'm a bit disappointed by the lack of
single-precision performance for std::pow
. Basically, you should not use
std::pow
if you want single-precision powers.
I hope you found this benchmark interesting :)
For those interested in the code of the benchmark, it's available on Github.
Comments
Comments powered by Disqus