Update: I've added a new section for larger values of :code:`n`.
Recently, I've been wondering about the performance of :code:`std::pow(x, n)`.
I'm talking here about the case when :code:`n` is an integer. In the case when
:code:`n` is not an integer, I believe, you should always use :code:`std::pow`
or use another specialized library.
In case when n is an integer, you can actually replace it with the direct
equivalent (for instance :code:`std::pow(x, 3) = x * x x`). If n is very large,
you'd rather write a loop of course ;) In practice, we generally use powers of
two and three much more often than power of 29, although that could happen. Of
course, it especially make sense to wonder about this if the pow is used inside
a loop. If you only use it once outside a loop, that won't be any difference on
the overall performance.
Since I'm mostly interested in single precision performance (neural networks are
only about single precision), the first benchmarks will be using :code:`float`.
.. TEASER_END
std::pow performances
#####################
So let's see what are the differences between :code:`std::pow(x, 2)` and
:code:`x * x`. All the code will be compiled in C++11. For the sake of it,
I'll also check the performance of the C :code:`pow` function.
Let's start with GCC-6.4 and -O2:
.. raw:: html
First, there is no difference between C pow function and std::pow, which is
expected. On the other hand, it's interesting to see that there is a definite
difference in performance between :code:`x * x` and :code:`std::pow(x, 2)`. This
is not a huge difference, but still around 2.5 times slower for
:code:`std::pow`.
Let's see if the difference is the same for bigger exponent.
.. raw:: html
This time the difference is very significant. :code:`x * x * x` is two orders
of magnitude faster than :code:`std::pow(x, n)`. It seems that the algorithm
used for bigger power is much less efficient. In any case, we can see that this
is not optimized for integer values of exponent values.
Let's see if this changes for a power of 4:
.. raw:: html
The numbers are almost the same. It seems the algorithm used for approximation
of the power does not depend on the exponent itself (at least between 3 and 4).
fast math
#########
One compiler option is very important to consider here: -ffast-math. This
compiler option will make some math operations much faster, but will also
violate a lot of IEEE compliance. In most cases, it does not matter since this
will mostly cover edge cases resulting in infinities and Not-A-Numbers. It also
will reduce the accuracy of some operations. If you really care about precise
computation, you should not use -ffast-math, but in most case, I think it's fine.
Any way, let's see if that changes anything:
.. raw:: html
For second power, it does not change anything. Let's see about the third power:
.. raw:: html
For the third power, for :code:`std::pow(x, 3)`, it is now much faster than
before. Even though it's still faster to use :code:`x * x * x` than
:code:`std::pow(x, 3)`, the difference is *only* around 2.5 times slower.
Clang
#####
I've tested several version of G++ (4.9.4, 5.4.0 and 6.4.0) and I've not seen
any significant difference in performance. Let's see if there are differences
between clang-3.9 and GCC 6.4.0. Since we are using a function from the library,
let's see if using libc++ makes any difference.
.. raw:: html
As it turns out, there is not much difference between the two compilers, but
clang is actually around 15% slower here. Moreover, there is no difference
between the two libraries. Let's see if this makes a difference for third power:
.. raw:: html
This time, the difference between the two compilers is more significant, clang
is about 50% slower than GCC on this. Again, there is no significant difference
between the two C++ libraries. I was expecting more of a difference between the
two, but it seems they are using a similar implementations, if not the same.
double precision
################
As said earlier, all the tests were run in single precision (:code:`float`).
Let's see now if it's any different with double precision (:code:`double`).
Again, I'll use G++ 5.4.0 to start with.
Here are the results first without -ffast-math:
.. raw:: html
This is very interesting! Here there is no overhead of using :code:`std::pow`
compared to direct multiplication (:code:`x * x`). It seems that most of the
overhead of this function for single precision was in fact in conversion to
double since it seems that the algorithm itself is only implemented for double
precision. Let's see about third power now:
.. raw:: html
As seen before, with third power, the overhead is actually huge. Although this
is slightly faster than when using single precision, it is still 2 orders of
magnitude slower than direct multiplication :code:`x * x * x`. Let's see what
happens with -ffast-math:
.. raw:: html
With -ffast-math, there is absolutely no overhead anymore for :code:`std::pow(x, n)`
even for third power. The results are the same for clang. I've checked for
higher values of the exponent and the result is also the same.
Bigger exponents
################
Now, let's try to test for which :code:`n` is :code:`code:std::pow(x, n)`
becoming faster than multiplying in a loop. Since std::pow is using a special
algorithm to perform the computation rather than be simply loop-based
multiplications, there may be a point after which it's more interesting to use
the algorithm rather than a loop.
First, our pow function:
.. code:: c++
double my_pow(double x, size_t n){
double r = 1.0;
while(n > 0){
r *= x;
--n;
}
return r;
}
And now, let's see the performance. I've compiled my benchmark with GCC 4.9.3
and running on my old Sandy Bridge processor. Here are the results for 1000
calls to each functions:
.. raw:: html
We can see that between :code:`n=100` and :code:`n=110`, :code:`std::pow(x, n)`
starts to be faster than :code:`my_pow(x, n)`. At this point, you should only
use :code:`std::pow(x, n)`. Interestingly too, the time for :code:`std::pow(x,
n)` is decreasing. Let's see how is the performance with higher range of
:code:`n`:
.. raw:: html
We can see that the pow function time still remains stable while our loop-based
pow function still increases linearly. At :code:`n=1000`, :code:`std::pow` is
one order of magnitude faster than :code:`my_pow`.
Overall, if you do not care much about extreme accuracy, you may consider using
you own pow function for small-ish (integer) :code:`n` values. After
:code:`n=100`, it becomes more interesting to use :code:`std::pow`.
Conclusion
##########
If you are using double precision (:code:`double`), :code:`std::pow(x, n)` will
be slower than the handcrafted equivalent unless you use -ffast-math, in which
case, there is absolutely no overhead. The overhead without using the compiler
option is quite large, around 2 orders of magnitude, starting from the third
power. With or without -ffast-math, std::pow(x, 2) has no overhead compared to
:code:`x * x`.
For single precision, it's another story! For the two compilers that have been
tested and for small integer values of n (but I think it's stays the same for
large integer values of n), it's always faster to use direct multiplication
rather than exponentiation via :code:`std::pow(x, n)`. Indeed, it seems that
there is no optimization for the case when n is an integer. When -ffast-math is
used, the difference it not very big, around 2.5 times slower for GCC and around
3.5 times slower for clang. I'm a bit disappointed by the lack of
single-precision performance for :code:`std::pow`. Basically, you should not use
:code:`std::pow` if you want single-precision powers.
I hope you found this benchmark interesting :)
For those interested in the code of the benchmark, it's available
`on Github `_.
.. raw:: html