Decrease DLL neural network compilation time with C++17

Baptiste Wicht

Just last week, I've migrated my Expression Templates Library (ETL) library to C++17, it is now also done in my Deep Learning Library (DLL) library. In ETL, this resulted in a much nicer code overall, but no real improvement in compilation time.

The objective of the migration of DLL was two-fold. First, I also wanted to simplify some code, especially with if constexpr. But I also especially wanted to try to reduce the compilation time. In the past, I've already tried a few changes with C++17, with good results on the compilation of the entire test suite. While this is very good, this is not very representative of users of the library. Indeed, normally you'll have only one network in your source file not several. The new changes will especially help in the case of many networks, but less in the case of a single network per source file.

This time, I decided to test the compilation on the examples. I've tested the eight official examples from the DLL library:

mnist_dbn: A fully-connected Deep Belief Network (DBN) on the MNIST data set with three layers
char_cnn: A special CNN with embeddings and merge and group layers for text recognition
imagenet_cnn: A 12 layers Convolutional Neural Network (CNN) for Imagenet
mnist_ae: A simple two-layers auto-encoder for MNIST
mnist_cnn: A simple 6 layers CNN for MNIST
mnist_deep_ae: A deep auto-encoder for MNIST, only fully-connected
mnist_lstm: A Recurrent Neural Network (RNN) with Long Short Term Memory (LSTM) cells
mnist_mlp: A simple fully-connected network for MNIST, with dropout
mnist_rnn: A simple RNN with simple cells for MNIST

This is really representative of what users can do with the library and I think it's a much better for compilation time.

For reference, you can find the source code of all the examples online.

Results

Let's start with the results. I've tested this at different stages of the migration with clang 5 and GCC 7.2. I tested the following steps:

The original C++14 version
Simply compiling in c++17 mode (-std=c++17)
Using the C++17 version of the ETL library
Upgrading DLL to C++17 (without ETL)
ETL and DLL in C++17 versions

I've compiled each example independently in release_debug mode. Here are the results for G++ 7.2:

Example	0	1	2	3	4	5	6	7	8
C++14	37.818	32.944	33.511	15.403	29.998	16.911	24.745	18.974	19.006
-std=c++17	38.358	32.409	32.707	15.810	30.042	16.896	24.635	19.134	19.027
ETL C++17	36.045	31.000	30.942	15.322	28.840	16.747	24.151	18.208	18.939
DLL C++17	35.251	32.577	32.854	15.653	29.758	16.851	24.606	19.098	19.146
Final C++17	32.289	31.133	30.939	15.232	28.753	16.526	24.326	18.116	17.819
Final Improvement	14.62%	5.49%	7.67%	1.11%	4.15%	2.27%	1.69%	4.52%	6.24%

The difference by just enabling c++17 is not significant. On the other hand, some significant gain can be obtained by using the C++17 version of ETL, especially for the DBN version and for the CNN versions. Except for the DBN case, the migration of DLL to C++17 did not bring any significant advantage. When everything is combined, the gains are more important :) In the best case, the example is 14.6% faster to compile.

Let's see if it's the same with clang++ 5.0:

Example	0	1	2	3	4	5	6	7	8
C++14	40.690	34.753	35.488	16.146	31.926	17.708	29.806	19.207	20.858
-std=c++17	40.502	34.664	34.990	16.027	31.510	17.630	29.465	19.161	20.860
ETL C++17	37.386	33.008	33.896	15.519	30.269	16.995	28.897	18.383	19.809
DLL C++17	37.252	34.592	35.250	16.131	31.782	17.606	29.595	19.126	20.782
Final C++17	34.470	33.154	33.881	15.415	30.279	17.078	28.808	18.497	19.761
Final Improvement	15.28%	4.60%	4.52%	4.52%	5.15%	3.55%	3.34%	3.69%	5.25%

First of all, as I have seen time after time, clang is still slower than GCC. It's a not a big difference, but still significant. Overall, the gains are a bit higher on clang than on GCC, but not by much. Interestingly, the migration of DLL to C++17 is less interesting in terms of compilation time for clang. It seems even to slow down compilation on some examples. On the other hand, the migration of ETL is more important than on GCC.

Overall, every example is faster to compile using both libraries in C++17, but we don't have spectacular speed-ups. With clang, we have speedups from 3.3% to 15.3%. With GCC, we have speedup from 1.1% to 14.6%. It's not very high, but I'm already satisfied with these results.

C++17 in DLL

Overall, the migration of DLL to C++17 was quite similar to that of ETL. You can take a look at my previous article if you want more details on C++17 features I've used.

I've replaced a lot of SFINAE functions with if constexpr. I've also replaced a lot of statif_if with if constexpr. There was a large number of these in DLL's code. I also enabled all the constexpr that were commented for this exact time :)

I was also thinking that I could replace a lot of meta-programming stuff with fold expressions. While I was able to replace a few of them, most of them were harder to replace with fold expressions. Indeed, the variadic pack is often hidden behind another class and therefore the pack is not directly usable from the network class or the group and merge layers classes. I didn't want to start a big refactoring just to use a C++17 feature, the current state of this code is fine.

I made some use of structured bindings as well, but again not as much as I was thinking. In fact, a lot of time, I'm assigning the elements of a pair or tuple to existing variables not declaring new variables and unfortunately, you can only use structured bindings with auto declaration.

Overall, the code is significantly better now, but there was less impact than there was on ETL. It's also a smaller code base, so maybe this is normal and my expectations were too high ;)

Conclusion

The trunk of DLL is now a C++17 library :) I think this improve the quality of the code by a nice margin! Even though, there is still some work to be done to improve the code, especially for the DBN pretraining code, the quality is quite good now. Moreover, the switch to C++17 made the compilation of neural networks using the DLL library faster to compile, from 1.1% in the worst case to 15.3% in the best case! I don't know when I will release the next version of DLL, but it will take some time. I'll especially have to polish the RNN support and add a sequence to sequence loss before I will release the 1.1 version of DLL.

I'm quite satisfied with C++17 even if I would have liked a bit more features to play with! I'm already a big fan of if constexpr, this can make the code much nicer and fold expressions are much more intuitive than their previous recursive template counterpart.

I may also consider migrating some parts of the cpp-utils library, but if I do, it will only be through the use of conditionals in order not to break the other projects that are based on the library.

Compiler benchmark GCC and Clang on C++ library (ETL)

How I made my Deep Learning Library 38% faster to compile (Optimization and C++17 if constexpr)

Partial type erasing in Deep Learning Library (DLL) to improve compilation time

zapcc - a faster C++ compiler

zapcc C++ compilation speed against gcc 5.4 and clang 3.9

Release of zapcc 1.0 - Fast C++ compiler

Results

C++17 in DLL

Conclusion

Related articles

Comments