Decrease DLL neural network compilation time with C++17

Just last week, I've migrated my Expression Templates Library (ETL) library to C++17, it is now also done in my Deep Learning Library (DLL) library. In ETL, this resulted in a much nicer code overall, but no real improvement in compilation time.

The objective of the migration of DLL was two-fold. First, I also wanted to simplify some code, especially with if constexpr. But I also especially wanted to try to reduce the compilation time. In the past, I've already tried a few changes with C++17, with good results on the compilation of the entire test suite. While this is very good, this is not very representative of users of the library. Indeed, normally you'll have only one network in your source file not several. The new changes will especially help in the case of many networks, but less in the case of a single network per source file.

This time, I decided to test the compilation on the examples. I've tested the eight official examples from the DLL library:

  1. mnist_dbn: A fully-connected Deep Belief Network (DBN) on the MNIST data set with three layers

  2. char_cnn: A special CNN with embeddings and merge and group layers for text recognition

  3. imagenet_cnn: A 12 layers Convolutional Neural Network (CNN) for Imagenet

  4. mnist_ae: A simple two-layers auto-encoder for MNIST

  5. mnist_cnn: A simple 6 layers CNN for MNIST

  6. mnist_deep_ae: A deep auto-encoder for MNIST, only fully-connected

  7. mnist_lstm: A Recurrent Neural Network (RNN) with Long Short Term Memory (LSTM) cells

  8. mnist_mlp: A simple fully-connected network for MNIST, with dropout

  9. mnist_rnn: A simple RNN with simple cells for MNIST

This is really representative of what users can do with the library and I think it's a much better for compilation time.

For reference, you can find the source code of all the examples online.

Results

Let's start with the results. I've tested this at different stages of the migration with clang 5 and GCC 7.2. I tested the following steps:

  1. The original C++14 version

  2. Simply compiling in c++17 mode (-std=c++17)

  3. Using the C++17 version of the ETL library

  4. Upgrading DLL to C++17 (without ETL)

  5. ETL and DLL in C++17 versions

I've compiled each example independently in release_debug mode. Here are the results for G++ 7.2:

Example

0

1

2

3

4

5

6

7

8

C++14

37.818

32.944

33.511

15.403

29.998

16.911

24.745

18.974

19.006

-std=c++17

38.358

32.409

32.707

15.810

30.042

16.896

24.635

19.134

19.027

ETL C++17

36.045

31.000

30.942

15.322

28.840

16.747

24.151

18.208

18.939

DLL C++17

35.251

32.577

32.854

15.653

29.758

16.851

24.606

19.098

19.146

Final C++17

32.289

31.133

30.939

15.232

28.753

16.526

24.326

18.116

17.819

Final Improvement

14.62%

5.49%

7.67%

1.11%

4.15%

2.27%

1.69%

4.52%

6.24%

The difference by just enabling c++17 is not significant. On the other hand, some significant gain can be obtained by using the C++17 version of ETL, especially for the DBN version and for the CNN versions. Except for the DBN case, the migration of DLL to C++17 did not bring any significant advantage. When everything is combined, the gains are more important :) In the best case, the example is 14.6% faster to compile.

Let's see if it's the same with clang++ 5.0:

Example

0

1

2

3

4

5

6

7

8

C++14

40.690

34.753

35.488

16.146

31.926

17.708

29.806

19.207

20.858

-std=c++17

40.502

34.664

34.990

16.027

31.510

17.630

29.465

19.161

20.860

ETL C++17

37.386

33.008

33.896

15.519

30.269

16.995

28.897

18.383

19.809

DLL C++17

37.252

34.592

35.250

16.131

31.782

17.606

29.595

19.126

20.782

Final C++17

34.470

33.154

33.881

15.415

30.279

17.078

28.808

18.497

19.761

Final Improvement

15.28%

4.60%

4.52%

4.52%

5.15%

3.55%

3.34%

3.69%

5.25%

First of all, as I have seen time after time, clang is still slower than GCC. It's a not a big difference, but still significant. Overall, the gains are a bit higher on clang than on GCC, but not by much. Interestingly, the migration of DLL to C++17 is less interesting in terms of compilation time for clang. It seems even to slow down compilation on some examples. On the other hand, the migration of ETL is more important than on GCC.

Overall, every example is faster to compile using both libraries in C++17, but we don't have spectacular speed-ups. With clang, we have speedups from 3.3% to 15.3%. With GCC, we have speedup from 1.1% to 14.6%. It's not very high, but I'm already satisfied with these results.

C++17 in DLL

Overall, the migration of DLL to C++17 was quite similar to that of ETL. You can take a look at my previous article if you want more details on C++17 features I've used.

I've replaced a lot of SFINAE functions with if constexpr. I've also replaced a lot of statif_if with if constexpr. There was a large number of these in DLL's code. I also enabled all the constexpr that were commented for this exact time :)

I was also thinking that I could replace a lot of meta-programming stuff with fold expressions. While I was able to replace a few of them, most of them were harder to replace with fold expressions. Indeed, the variadic pack is often hidden behind another class and therefore the pack is not directly usable from the network class or the group and merge layers classes. I didn't want to start a big refactoring just to use a C++17 feature, the current state of this code is fine.

I made some use of structured bindings as well, but again not as much as I was thinking. In fact, a lot of time, I'm assigning the elements of a pair or tuple to existing variables not declaring new variables and unfortunately, you can only use structured bindings with auto declaration.

Overall, the code is significantly better now, but there was less impact than there was on ETL. It's also a smaller code base, so maybe this is normal and my expectations were too high ;)

Conclusion

The trunk of DLL is now a C++17 library :) I think this improve the quality of the code by a nice margin! Even though, there is still some work to be done to improve the code, especially for the DBN pretraining code, the quality is quite good now. Moreover, the switch to C++17 made the compilation of neural networks using the DLL library faster to compile, from 1.1% in the worst case to 15.3% in the best case! I don't know when I will release the next version of DLL, but it will take some time. I'll especially have to polish the RNN support and add a sequence to sequence loss before I will release the 1.1 version of DLL.

I'm quite satisfied with C++17 even if I would have liked a bit more features to play with! I'm already a big fan of if constexpr, this can make the code much nicer and fold expressions are much more intuitive than their previous recursive template counterpart.

I may also consider migrating some parts of the cpp-utils library, but if I do, it will only be through the use of conditionals in order not to break the other projects that are based on the library.

Related articles

  • Compiler benchmark GCC and Clang on C++ library (ETL)
  • How I made my Deep Learning Library 38% faster to compile (Optimization and C++17 if constexpr)
  • Partial type erasing in Deep Learning Library (DLL) to improve compilation time
  • zapcc - a faster C++ compiler
  • zapcc C++ compilation speed against gcc 5.4 and clang 3.9
  • Release of zapcc 1.0 - Fast C++ compiler
  • Comments

    Comments powered by Disqus