How I made my Deep Learning Library 38% faster to compile (Optimization and C++17 if constexpr)

My Deep Learning Library (DLL) project is a C++ library for training and using artificial neural networks (you can take a look at this post about DLL if you want more information).

While I made a lot of effort to make it as fast as possible to train and run neural networks, the compilation time has been steadily going up and is becoming quite annoying. This library is heavily templated and all the matrix operations are done using my Expression Templates Library (ETL) which is more than template-heavy itself.

In this post, I'll present two techniques with which I've been able to reduce the total compilation of the DLL unit tests by up to 38%.

Reduce overhead of Expression Templates Library

One of the feature I'm using a lot in ETL is that the implementation of each algorithm can be chosen in the code directly.

For example, for a matrix multiplication, to force the vectorized implementation:

C = selected_helper(etl::gemm_impl::VEC, A * B);

This works great and simplifies a lot the testing and benchmarking of the application. Nevertheless, this complicates a lot the selection of the algorithm and incurs quite some overhead on compilation time. I've come to realize that for most usage of the library this is not necessary. This is something that should only be used by the tests and benchmark. Therefore, I disabled this behaviour by default and added a macro to enable this behaviour (ETL_MANUAL_SELECT). This simplifies a lot the code in the case the macro is not defined. In fact, it also greatly reduces the compilation time of ETL usages where manual selection is not enabled, basically all ETL usages. In one of my ETL examples, the compilation time has gone down from about 30 seconds to about 15 seconds. I was actually quite surprised by the impact of the change. Therefore, I updated DLL to see the difference with this new version of ETL. I tested the difference with GCC 7.1 and clang 3.9 with all the possible options from ETL. Nothing changed in DLL, only the ETL library was updated. Here are the results with DLL's unit tests:

Compiler

GCC Debug

GCC Release

Clang Debug

Clang Release

Base version

560s

1188s

861s

1179s

Manual selection

490s

866s

704s

813s

Improvement

12.5%

27.5%

18%

31%

As you can see, the compilation time improvement are quite substantial :)

I'm really happy with these results. With not that much changes in ETL, the compilation time of DLL has been nicely reduced.

Use if constexpr for algorithm selection

Since C++17 has been supported in compilers, I've been wanting to play around with if constexpr. Before, when necessary, I've been emulated if constexpr with SFINAE and a lambda, but it's really not nice code. However, it already helped me in the past, for instance a single static_if reduced the compilation time of DLL by about 30%. I'm trying not to abuse it in the code, I've reserved it for a very instances in DLL's code. However, if constexpr is much nicer than an emulated version. The only problem with it is that you need a really recent compiler for it. Especially with GCC, you need GCC 7.1 that has been available since May this year only. I'd rather not force too strong constraints on DLL requirements.

Nevertheless, I've been annotating a lot of my if with constexpr annotations in the form of comments (if /*constexpr*/) so that I can switch back and forth to see the impact of C++17 if constexpr on my compilation time. Interestingly, only enabling C++17 as a compiler option made compilation slower by about 2%. Moreover, I've also found a few places in my code where C++17 was breaking. For instance, C++17 Ranges is adding a function std::size(range) that is ambiguous with etl::size(matrix) in some cases where ADL is concerned.

Now that selection is much simpler in ETL, the complete selection can now be made constexpr and resolved at compile time.And therefore, all the branches can be resolved at compile time. Hopefully, this should avoid a lot of algorithms implementation to be instantiated.

Here are the results I've obtained:

Compiler

GCC Debug

GCC Release

Clang Debug

Clang Release

Base version

560s

1188s

861s

1179s

Manual selection

490s

866s

704s

813s

C++17 if constexpr (ETL)

444s

767s

663s

731s

Improvement

10%

11.5%

6%

10%

By enabling C++17 and transforming some if into if constexpr, compilation time was reducing by up to 11.5%. It's pretty good, but I would have expected a bit more. Nevertheless, it's still an upgrade :) We can see that the impact is more important for release compilation. It makes sense since it should remove quite a lot of code hard to optimize.

I've also started experimenting with if constexpr in DLL itself. Unfortunately, this didn't have as much effect as I wanted. Here are some preliminary results:

Compiler

GCC Debug

GCC Release

Clang Debug

Clang Release

Base version

560s

1188s

861s

1179s

Manual selection

490s

866s

704s

813s

C++17 if constexpr (ETL)

444s

767s

663s

731s

C++17 if constexpr (DLL)

434s

765s

658s

725s

Improvement

2.3%

0.3%

0.75%

1%

As you can see, the improvements are pretty small, almost not significant in some cases. Nevertheless, I believe there is some improvements that is possible. I've got a few other cases that I plan to refactor to allow for more if constexpr action. I'll see about that in the coming months.

Conclusion

By disabling manual selection by default in my Expression Templates Library (ETL) and by making use of C++17 if constexpr feature, I've been able to reduce the compilation time of DLL's unit test suite by up to 38% in the best case.

So far, only the improvements to ETL are online, the C++17 improvements are only present on the form of comments that I switch on and off when I do tests like that. I think I'm still gonna wait a bit until I enable C++17 for ETL, but it's probably gonna be enabled for ETL 1.3. And, by the way, the version 1.2 of ETL should soon be released.

You can take a look at my libraries on Github:

As ever, don't hesitate if you have any question or comment on the matter, to post a comment on this blog or on Github :)

Related articles

  • Decrease DLL neural network compilation time with C++17
  • Compiler benchmark GCC and Clang on C++ library (ETL)
  • Partial type erasing in Deep Learning Library (DLL) to improve compilation time
  • zapcc - a faster C++ compiler
  • zapcc C++ compilation speed against gcc 5.4 and clang 3.9
  • Release of zapcc 1.0 - Fast C++ compiler
  • Comments

    Comments powered by Disqus