Update: I've added a new section for larger values of :code:`n`. Recently, I've been wondering about the performance of :code:`std::pow(x, n)`. I'm talking here about the case when :code:`n` is an integer. In the case when :code:`n` is not an integer, I believe, you should always use :code:`std::pow` or use another specialized library. In case when n is an integer, you can actually replace it with the direct equivalent (for instance :code:`std::pow(x, 3) = x * x x`). If n is very large, you'd rather write a loop of course ;) In practice, we generally use powers of two and three much more often than power of 29, although that could happen. Of course, it especially make sense to wonder about this if the pow is used inside a loop. If you only use it once outside a loop, that won't be any difference on the overall performance. Since I'm mostly interested in single precision performance (neural networks are only about single precision), the first benchmarks will be using :code:`float`. .. TEASER_END std::pow performances ##################### So let's see what are the differences between :code:`std::pow(x, 2)` and :code:`x * x`. All the code will be compiled in C++11. For the sake of it, I'll also check the performance of the C :code:`pow` function. Let's start with GCC-6.4 and -O2: .. raw:: html
First, there is no difference between C pow function and std::pow, which is expected. On the other hand, it's interesting to see that there is a definite difference in performance between :code:`x * x` and :code:`std::pow(x, 2)`. This is not a huge difference, but still around 2.5 times slower for :code:`std::pow`. Let's see if the difference is the same for bigger exponent. .. raw:: html
This time the difference is very significant. :code:`x * x * x` is two orders of magnitude faster than :code:`std::pow(x, n)`. It seems that the algorithm used for bigger power is much less efficient. In any case, we can see that this is not optimized for integer values of exponent values. Let's see if this changes for a power of 4: .. raw:: html
The numbers are almost the same. It seems the algorithm used for approximation of the power does not depend on the exponent itself (at least between 3 and 4). fast math ######### One compiler option is very important to consider here: -ffast-math. This compiler option will make some math operations much faster, but will also violate a lot of IEEE compliance. In most cases, it does not matter since this will mostly cover edge cases resulting in infinities and Not-A-Numbers. It also will reduce the accuracy of some operations. If you really care about precise computation, you should not use -ffast-math, but in most case, I think it's fine. Any way, let's see if that changes anything: .. raw:: html
For second power, it does not change anything. Let's see about the third power: .. raw:: html
For the third power, for :code:`std::pow(x, 3)`, it is now much faster than before. Even though it's still faster to use :code:`x * x * x` than :code:`std::pow(x, 3)`, the difference is *only* around 2.5 times slower. Clang ##### I've tested several version of G++ (4.9.4, 5.4.0 and 6.4.0) and I've not seen any significant difference in performance. Let's see if there are differences between clang-3.9 and GCC 6.4.0. Since we are using a function from the library, let's see if using libc++ makes any difference. .. raw:: html
As it turns out, there is not much difference between the two compilers, but clang is actually around 15% slower here. Moreover, there is no difference between the two libraries. Let's see if this makes a difference for third power: .. raw:: html
This time, the difference between the two compilers is more significant, clang is about 50% slower than GCC on this. Again, there is no significant difference between the two C++ libraries. I was expecting more of a difference between the two, but it seems they are using a similar implementations, if not the same. double precision ################ As said earlier, all the tests were run in single precision (:code:`float`). Let's see now if it's any different with double precision (:code:`double`). Again, I'll use G++ 5.4.0 to start with. Here are the results first without -ffast-math: .. raw:: html
This is very interesting! Here there is no overhead of using :code:`std::pow` compared to direct multiplication (:code:`x * x`). It seems that most of the overhead of this function for single precision was in fact in conversion to double since it seems that the algorithm itself is only implemented for double precision. Let's see about third power now: .. raw:: html
As seen before, with third power, the overhead is actually huge. Although this is slightly faster than when using single precision, it is still 2 orders of magnitude slower than direct multiplication :code:`x * x * x`. Let's see what happens with -ffast-math: .. raw:: html
With -ffast-math, there is absolutely no overhead anymore for :code:`std::pow(x, n)` even for third power. The results are the same for clang. I've checked for higher values of the exponent and the result is also the same. Bigger exponents ################ Now, let's try to test for which :code:`n` is :code:`code:std::pow(x, n)` becoming faster than multiplying in a loop. Since std::pow is using a special algorithm to perform the computation rather than be simply loop-based multiplications, there may be a point after which it's more interesting to use the algorithm rather than a loop. First, our pow function: .. code:: c++ double my_pow(double x, size_t n){ double r = 1.0; while(n > 0){ r *= x; --n; } return r; } And now, let's see the performance. I've compiled my benchmark with GCC 4.9.3 and running on my old Sandy Bridge processor. Here are the results for 1000 calls to each functions: .. raw:: html
We can see that between :code:`n=100` and :code:`n=110`, :code:`std::pow(x, n)` starts to be faster than :code:`my_pow(x, n)`. At this point, you should only use :code:`std::pow(x, n)`. Interestingly too, the time for :code:`std::pow(x, n)` is decreasing. Let's see how is the performance with higher range of :code:`n`: .. raw:: html
We can see that the pow function time still remains stable while our loop-based pow function still increases linearly. At :code:`n=1000`, :code:`std::pow` is one order of magnitude faster than :code:`my_pow`. Overall, if you do not care much about extreme accuracy, you may consider using you own pow function for small-ish (integer) :code:`n` values. After :code:`n=100`, it becomes more interesting to use :code:`std::pow`. Conclusion ########## If you are using double precision (:code:`double`), :code:`std::pow(x, n)` will be slower than the handcrafted equivalent unless you use -ffast-math, in which case, there is absolutely no overhead. The overhead without using the compiler option is quite large, around 2 orders of magnitude, starting from the third power. With or without -ffast-math, std::pow(x, 2) has no overhead compared to :code:`x * x`. For single precision, it's another story! For the two compilers that have been tested and for small integer values of n (but I think it's stays the same for large integer values of n), it's always faster to use direct multiplication rather than exponentiation via :code:`std::pow(x, n)`. Indeed, it seems that there is no optimization for the case when n is an integer. When -ffast-math is used, the difference it not very big, around 2.5 times slower for GCC and around 3.5 times slower for clang. I'm a bit disappointed by the lack of single-precision performance for :code:`std::pow`. Basically, you should not use :code:`std::pow` if you want single-precision powers. I hope you found this benchmark interesting :) For those interested in the code of the benchmark, it's available `on Github `_. .. raw:: html