C++11 Performance tip: When to use std::pow ?

Update: I've added a new section for larger values of n.

Recently, I've been wondering about the performance of std::pow(x, n). I'm talking here about the case when n is an integer. In the case when n is not an integer, I believe, you should always use std::pow or use another specialized library.

In case when n is an integer, you can actually replace it with the direct equivalent (for instance std::pow(x, 3) = x * x x). If n is very large, you'd rather write a loop of course ;) In practice, we generally use powers of two and three much more often than power of 29, although that could happen. Of course, it especially make sense to wonder about this if the pow is used inside a loop. If you only use it once outside a loop, that won't be any difference on the overall performance.

Since I'm mostly interested in single precision performance (neural networks are only about single precision), the first benchmarks will be using float.

std::pow performances

So let's see what are the differences between std::pow(x, 2) and x * x. All the code will be compiled in C++11. For the sake of it, I'll also check the performance of the C pow function. Let's start with GCC-6.4 and -O2:

First, there is no difference between C pow function and std::pow, which is expected. On the other hand, it's interesting to see that there is a definite difference in performance between x * x and std::pow(x, 2). This is not a huge difference, but still around 2.5 times slower for std::pow.

Let's see if the difference is the same for bigger exponent.

This time the difference is very significant. x * x * x is two orders of magnitude faster than std::pow(x, n). It seems that the algorithm used for bigger power is much less efficient. In any case, we can see that this is not optimized for integer values of exponent values.

Let's see if this changes for a power of 4:

The numbers are almost the same. It seems the algorithm used for approximation of the power does not depend on the exponent itself (at least between 3 and 4).

fast math

One compiler option is very important to consider here: -ffast-math. This compiler option will make some math operations much faster, but will also violate a lot of IEEE compliance. In most cases, it does not matter since this will mostly cover edge cases resulting in infinities and Not-A-Numbers. It also will reduce the accuracy of some operations. If you really care about precise computation, you should not use -ffast-math, but in most case, I think it's fine. Any way, let's see if that changes anything:

For second power, it does not change anything. Let's see about the third power:

For the third power, for std::pow(x, 3), it is now much faster than before. Even though it's still faster to use x * x * x than std::pow(x, 3), the difference is only around 2.5 times slower.

Clang

I've tested several version of G++ (4.9.4, 5.4.0 and 6.4.0) and I've not seen any significant difference in performance. Let's see if there are differences between clang-3.9 and GCC 6.4.0. Since we are using a function from the library, let's see if using libc++ makes any difference.

As it turns out, there is not much difference between the two compilers, but clang is actually around 15% slower here. Moreover, there is no difference between the two libraries. Let's see if this makes a difference for third power:

This time, the difference between the two compilers is more significant, clang is about 50% slower than GCC on this. Again, there is no significant difference between the two C++ libraries. I was expecting more of a difference between the two, but it seems they are using a similar implementations, if not the same.

double precision

As said earlier, all the tests were run in single precision (float). Let's see now if it's any different with double precision (double). Again, I'll use G++ 5.4.0 to start with.

Here are the results first without -ffast-math:

This is very interesting! Here there is no overhead of using std::pow compared to direct multiplication (x * x). It seems that most of the overhead of this function for single precision was in fact in conversion to double since it seems that the algorithm itself is only implemented for double precision. Let's see about third power now:

As seen before, with third power, the overhead is actually huge. Although this is slightly faster than when using single precision, it is still 2 orders of magnitude slower than direct multiplication x * x * x. Let's see what happens with -ffast-math:

With -ffast-math, there is absolutely no overhead anymore for std::pow(x, n) even for third power. The results are the same for clang. I've checked for higher values of the exponent and the result is also the same.

Bigger exponents

Now, let's try to test for which n is code:std::pow(x, n) becoming faster than multiplying in a loop. Since std::pow is using a special algorithm to perform the computation rather than be simply loop-based multiplications, there may be a point after which it's more interesting to use the algorithm rather than a loop.

First, our pow function:

double my_pow(double x, size_t n){
    double r = 1.0;

    while(n > 0){
        r *= x;
        --n;
    }

    return r;
}

And now, let's see the performance. I've compiled my benchmark with GCC 4.9.3 and running on my old Sandy Bridge processor. Here are the results for 1000 calls to each functions:

We can see that between n=100 and n=110, std::pow(x, n) starts to be faster than my_pow(x, n). At this point, you should only use std::pow(x, n). Interestingly too, the time for std::pow(x, n) is decreasing. Let's see how is the performance with higher range of n:

We can see that the pow function time still remains stable while our loop-based pow function still increases linearly. At n=1000, std::pow is one order of magnitude faster than my_pow.

Overall, if you do not care much about extreme accuracy, you may consider using you own pow function for small-ish (integer) n values. After n=100, it becomes more interesting to use std::pow.

Conclusion

If you are using double precision (double), std::pow(x, n) will be slower than the handcrafted equivalent unless you use -ffast-math, in which case, there is absolutely no overhead. The overhead without using the compiler option is quite large, around 2 orders of magnitude, starting from the third power. With or without -ffast-math, std::pow(x, 2) has no overhead compared to x * x.

For single precision, it's another story! For the two compilers that have been tested and for small integer values of n (but I think it's stays the same for large integer values of n), it's always faster to use direct multiplication rather than exponentiation via std::pow(x, n). Indeed, it seems that there is no optimization for the case when n is an integer. When -ffast-math is used, the difference it not very big, around 2.5 times slower for GCC and around 3.5 times slower for clang. I'm a bit disappointed by the lack of single-precision performance for std::pow. Basically, you should not use std::pow if you want single-precision powers.

I hope you found this benchmark interesting :)

For those interested in the code of the benchmark, it's available on Github.

Related articles

  • C++11 Performance tip: Update on when to use std::pow ?
  • C++ Containers Benchmark: vector/list/deque and plf::colony
  • C++ benchmark - std::vector VS std::list
  • C++ benchmark – std::vector VS std::list VS std::deque
  • C++ Benchmark - std::list VS boost::intrusive::list
  • GCC 4.7 vs CLang 3.1 on eddic
  • Comments

    Comments powered by Disqus