Back to Linux

Baptiste Wicht

2026-02-15 07:32

Again, it has been too long since I posted on this blog. I am still very busy between family, work and my other blog.

I wanted to post a short update on my transition back to linux. For the last 5 or 6 years of so, I have been running Windows on my desktop computers. My servers are still all on Gentoo. But it took me too much time to handle issues on the Linux desktop and time was exactly what I did not have. So, I opted out for Windows. I was also playing multiple games and it was much easier for me to do it on Windows.

I was still doing development and this is something that I do on Linux. With Windows Subsystems for Linux (WSL), I felt I had found the best of both worlds. I did all my development on WSL.

This is not the first time I am transitioning. I used to have Linux on my computers, then Windows again for a while. In 2013, I transitioned back to Linux and then back to Windows for simplicity.

So, what changed? I got fed up with some projects and changes by Microsoft:

The idea of Windows Recall is so bad for me that it should never have made it to be a project. If you are not aware, Windows Recall is a feature that takes a screenshot of your desktop every few seconds. The stated goal is to help us remember what we do on our computer and help us troubleshoot. I cannot believe they would implementation such a stupid thing on a computer. This is a disaster for privacy even if implemented properly (and the first implementation was far from proper). I would never run a computer where Windows Recall is enabled. This was never enabled for me because my PC was apparently not Copilot+ rated.
Microsoft keeps pushing for AI features into Windows. Over the years, Windows 11 has become steadily worse in my opinion. And now Microsoft is trying to push AI into the OS directly, but nobody wants that. Microsoft should push features the user wants, not force AI on everybody just because this fuels their share price.
Recently, it was disclosed that Microsoft is storing Bitlocker recovery keys on their servers. Bitlocker is Microsoft solution for full-disk encryption. But if the keys are stored on their cloud, this kinds of ruins the value of full-disk encryption.

There are other small reasons, but this boils down to the fact that I lost trust in Microsoft.

SO, I moved back to Linux. I pondered going back to Gentoo again, but I simply do not have enough time to tinker with it on a desktop computer where any upgrade takes forever. So, after some research, I decided to go with CachyOS, a distribution based on Arch Linux. I already have experience with Arch Linux. The focus on performance of CachyOS is talking to me. And the reviews I have read have convinced me.

So far, I have converted my laptop and desktop to CachyOS and everything has been very smooth. I am also discovering btrfs which I had never used before, switched to a new bootloader (Limine) and a new desktop environment (KDE Plasma). So, there is a lot to be learned.

Overall, I have found alternatives to every software I was using on WIndows, but one. The only thing I could not do is replace Aida64 which was configuring the LCD display on my ASUS ROG AIO. But this is a small price to pay. Once I upgrade this computer, I will use an AIO that can be configured on Linux.

And this time, I also tried playing games. I rarely have time for gaming, but when I do, I like to have it ready. And I was pleasantly surprised to find out that the two games I am playing (Path of Exile and Satisfactory) are both working perfectly fine on Steam on Linux.

What about you? What OS do you use?

Trying out Rust and C++26

Baptiste Wicht

2025-05-16 07:20

Comments

It has been too long again since I wrote here. But with family, my other blog and work, I did not get a chance to write here for a while.

Today is going to be a short update. I recently updated my main projects to use C++26. There is not much I can use at this point, but I have upgraded my compilers to GCC-15 and Clang-20 so that I can access most things. I will probably do some minor updates of the code in the coming weeks/months to see what I can add. What I am really waiting for is the concepts and the meta-programming, which are not yet in the base compilers. These two features are really going to make a huge difference on how we write C++ code.

In the last two months, I have started learning some Rust. Since I believe the best way to learn a language is by writing code, I quickly got into coding. I have ported a (small) part of my ETL and DLL libraries to Rust. I came to the point where I can train a simple Dense Neural Network on MNIST and it works. Of course, I also tried to make it as fast as possible. Currently, it is relatively close to at about 30% slower than ETL on a CPU and without MKL.

Overall, Rust is an interesting language. Its borrow checker and strong compiler contracts are definitely making the code safer. On the other hand, it also makes the code very strict. And some things that are very easy to write in C++ become very difficult to write in Rust.

Here are some of the issues I found with Rust:

The generics are very limited. Compared to the level of meta-programming we can do in C++ with templates, the comparison does not even start. Everything must be tied to Traits (a bit like concepts in this case) and we can't specialize code based on the type.
The SIMD library is also quite limited. I am currently using the portable-simd library from nightly Rust and it gets the job done. But we are again limited to the common interface between integers and floating points since we cannot specialize and unfortunately, some of the things are not defined for both. For instance, I could not get FMA in my code and I cannot easily tune the size of the vectors to the types.
Because of the borrow-checker, we are also not allowed to do some expressions where we mix mutable and immutable references, like x = b * x + d (x is once mutable and once immutable), so we either need to do two operations (slow) or do some inplace wrappers instead of relying on expressions (ugly
Currently, there is a lot of overhead to the parallelism I used (through rayon). Compared to simply using a simple thread pool, this is quickly much more complicated and apparently much more overhead. I am using much higher thresholds for parallelizing operations in my Rust library than I am in C++.

I should still mention that this was probably not the best project to start in Rust since C++ excels at templates while Rust does not. But it was fun. And it was not that difficult for me to quickly get into Rust. But it will take a longer time to become an expert.

While I enjoy writing Rust, I do enjoy writing C++ much more, so I really hope I can keep on writing C++ for a long time.

What about you? Anybody tried Rust?

C++ Refresh and new technos: FP16 and AVX512

Baptiste Wicht

2023-12-17 09:20

Comments

In the last few months, I have been working on refreshing my Expression Templates Library (ETL) project with modern C++. I am happy to report that I have now finished the refresh. It took me longer than I expected and I also had less time than I expected. But I am very happy about the result.

The main change in etl is the use of concepts. Expression Templates are making heavy use of SFINAE. And I was able to replace every single usage of SFINAE with concepts. I was also able to replace many assertions with concepts instead.

In most cases, I am using concepts instead of the typename declaration. For instance

template <typename A, typename M, cpp_enable_iff(is_4d<A>)>
static void apply(A&& in, M&& m, size_t c1, size_t c2, size_t c3) {

becomes

template <etl_4d A, typename M>
static void apply(A&& in, M&& m, size_t c1, size_t c2, size_t c3) {

This makes the declaration much simpler to read. In some cases, I had to use requires. For instance, here is the old definition of sub_view:

template <typename T, bool Aligned>
struct sub_view<T, Aligned, std::enable_if_t<!fast_sub_view_able<T>>> final {

and here is the new one:

template <typename T, bool Aligned>
requires(!fast_sub_view_able<T>)
struct sub_view<T, Aligned> final {

This also explains the requirements much better than using SFINAE. In most cases, concepts should be faster to compile than the old enable-if stuff. However, I have not yet had time to measure that.

I also made many small cleanups to the code, but they are probably not worth discussing.

AVX-512

What is worth discussing is that I finally added support for AVX-512 into etl. Before, I was waiting until Intel would give AVX-512F support to desktop CPU, but this is still not the case unfortunately. So I rented a VPS with AVX-512F support.

AVX512-F is able to process 512b vector operations at once, twice more than AVX-2. This makes it twice faster in theory. I have completed the support in etl and did some extra testing as well. I wish I had a machine where to test that on a regular basis (the VPS is pretty expensive to keep running). If I start working a lot on this project again, I will consider having a Xeon CPU at hone.

FP16 and BF16

Another thing I had been wanting to work on for many years is FP16 operations on GPU. FP16 is a floating point type with only 16b instead of the standard 32b for float. With my new computer and new versions of CUDA, I now got a working system to do FP16.

So, I implemented support for many FP16 operations on my etl-gpu-blas project that is used by etl to provide GPU operations. Thanks to operator overloading in CUDA, there is really nothing complicated about doing that.

Doing so, I also added support for BF16. This is another half-precision floating point type, but the mantissa and exponent part are different, apparently better tuned for machine learning. The support is more or less the same in CUDA, only a different type.

Currently, it is only used in etl-gpu-blas, not yet in DLL. Indeed, the problem with FP16 and BF16 is that there is no CPU support, so it is not as easy to use. I plan to improve that support in the future so that I can use it on DLL without even going to the CPU.

Next steps

Another thing I want to explore in the future is FP8, which is a quarter-precision floating point. However, FP8 can only be used for some tensor operations, through the use of tensor cores. So, I will likely only use it through CUDNN for convolution operations.

Finally, I also want to explore INT8 for neural networks. INT8 is easy to do on both CPU and GPU, but you cannot replace all types in a neural network with INT8, a certain level of quantization is necessary and storage should still be done in INT16 and INT32. But, that's not for tomorrow.

The next immediate projec is to refresh the code of dll, with C++23. Then, I want to run some more benchmarks and see what are my next steps to make dll faster on CPU and GPU.

Switching from vim to neovim

Baptiste Wicht

2023-09-20 19:25

Comments

As mentioned in my last article, I am now using neovim instead of vim. I just wanted to comment shortly on this change.

I have been using vim as my IDE for many years now (more than 10 years at least). But at the beginning of the year, I switched to neovim instead.

neovim is a fork of vim, so it is quite similar, but it has some very important differences. It started in 2015, after a multithreading patch was rejected from vim.

Here are some of the advantages:

Very powerful Builtin Langauge Server Protocol (LSP)
Faster startup
Builtin LUA support (the configuration can be written entirely in LUA)
Asynchronous tasks (was later added to vim)

A lot of things are compatible between both editors. When I started with neovim, I simply copied over my vim configuration and started using neovim with only very minor changes.

But what really convinced me to keep using neovim was the LSP. This feature and the great plugins that make use of it greatly simplify how to deeply integrate C++ into neovim. I am now taking advantage of the clang tooling to analyze C++ code on the fly. And I get great autocompletion as well.

All of this is doable with vim as well, but the last time I tried with vim, it was a nightmware to configure. With the help of (quite a few) plugins, I could setup automcompletion in a great way. This allows me to jump to declarations and definitions very easily.

I am far from being a neovim expert, but I am very happy with this tool. It definitely makes my life easier for configuring complex features, compared to vim.

If you are interested, you can find my neovim configuration in my dotfiles repository<https://github.com/wichtounet/dotfiles>. It's very fresh because it was previously saved on another place and was recently copied there. I plan to improve it and little by little rewrite it in LUA. There are still some experimental stuff and it could be improved significantly. But I am really happy with the features configured.

What about you? What do you think of neovim?

C++23 Refresh and budgetwarrior 1.1.0

Baptiste Wicht

2023-09-10 06:50

Comments

I am happy to announce the release of budgetwarrior 1.1.0.

The last release of budgetwarrior was more than 5 years ago. So, once I finished my C++20/C++23 refresh of the code, I decided it was a good time to generate a new release. There has been many improvements in this new version:

Many new graphs on the web interface
Add support for tracking stock values
Significant speed improvements if you have a lot of data in the tool
Assets can be set as inactive to be hidden
Introduction of the FI Net Worth
Better support of asset classes
Many small bug fixes

If you want to use the latest version, you can now use the docker image that I am publishing frequently. This docker image is what I use, so it should be fairly up-to-date. * budgetwarrior on docker hub <https://hub.docker.com/r/wichtounet/budgetwarrior>

Otherwise, you can of course compile it from the sources (another docker image is available as a build image). For this, you will need a very recent GCC (13+) or Clang (16+) compiler.

Most of the new features have been implemented a while ago, for my personal usage. The main recent changes are improvements in the code, related to using C++20 and C++23. I plan for all my projects to be compiled with C++23 by default. The reason is mostly so I can really learn about these features, since I cannot use them all at work. On that note, I was a bit disappointed by the support in clang, especially in libc++. I had to work around a few limitations in order to support clang.

The main C++20 feature that I am using in budgetwarrior is ranges. I have been able to improve many pieces of code from using loops and multiple ifs, to using a range. I have implemented many transforms and filters for budgetwarrior. And I am quite happy about the result. For instance:

bool budget::account_exists(const std::string& name){
        for(auto& account : all_accounts()){
                if(account.name == name){
                        return true;
                }
        }

        return false;
}

became:

bool budget::account_exists(const std::string& name){
        return !ranges::empty(all_accounts() | filter_by_name(name));
}

or here is another example of using ranges:

if (accounts.data() | not_id(id) | active_today | filter_by_name(account.name)) {
    throw budget_exception("There is already an account with the name " + account.name);
}

This is likely the biggest change, but I have made other improvements based on recent versions of C++:

Use of std::format
Use of the spaceship operator
Use of template lambdas
Use of std::string_view
Use of std::filesystem
Use of std::map::contains (and other such functions)

Overall, it was a lot of fun and I could significantly improve the code by using these new features (and more).

I am also taking advantage of clang-tidy now. I have added a clang-tidy configuration to my projects so that I can quickly check everything. I have also integrated clang-tidy in neovim (yes, I switched from vim to neovim, more on that later maybe) and this shows in real time where I could improve the code.

Finally, another change is that I am now taking advantage of Github Workflows. Every time I push to the repo, everything is compiled with the two compilers I support. This allows me to keep compatibility between both. In the future, I plan to add a few more tools to the workflows for code analysis. This is also an opportunity for me to learn about these workflows, which I never used before.

I am quite glad to be working on these projects again eve though I do not have much time. It was really fun to use all these new features in budgetwarrior. Next, I plan to refresh the code of ETL. And since I want to refresh my GPU skills as well, I will also work on etl_gpu_blas.

A short update

Baptiste Wicht

2023-08-13 08:55

Comments

I can't believe it's been about 5 years since the last update on this blog. For those, who are wondering what is happening, here is a short update.

About 6 years ago, I have finished my Ph.D. and about 5 years, I started to work as a software engineer, at Verisign. I am now a Senior Software Engineer, still at Verisign. After I started working professionally, I did not spend much time on my personal projects anymore. Before, my personal projects were part of my Ph.D., so it made sense to posts some updates on this blog and I had time to also post other articles.

But probably another bigger factor is that I started another blog in 2017, The Poor Swiss <https://thepoorswiss.com>. This is not a technical blog but a blog on personal finance, related to Switzerland. I have written more than 400 articles on this blog and I am still writing about one article a week these days. This takes a lot of time and made me scale down even more on my personal projects. The only project that continued getting some improvements was budgetwarrior since I am using it almost every day.

It's obviously worth mentioning that I got married about 5 years ago and that we now have a son, almost two years old. This obviously takes a lot of time!

So, why am I posting this short update today? I recently started missing working on personal projects. And I realized I had gotten out of touch with recent C++. It is a bit disappointing but my C++ level is becoming worse since I have started working professionally on C++. So, I learned in details about C++20 and C++23 and decided to apply some of it to my personal projects.

I have made significant cleanups to budgetwarrior, using C++23 by default now and switching to GCC 15 and Clang 16 as the default compilers. I plan to continue working on budgetwarrior_web, the web interface for budgetwarrior next. This project will also be converted to C++23. After that, I will probably continue with ETL and DLL. And hopefully, I plan to dedicate some time to adding more features to ETL and DLL as well. Thor OS is currently not on my list because I don't have the kind of energy and time this project requires.

I also hope I will be posting some more updates on this blog, but I will not adhere to any posting schedule. At least, I will not let five years go by before the next update!

budgetwarrior 1.0.1: Allocation tracking, Retirement calculator and bug fixes

Baptiste Wicht

2018-04-03 10:58

Comments

I'm happy to announce the release of budgetwarrior 1.0.1. This new version contains a series of improvement over the 1.0 version and some new features.

I haven't been very active this last month. I have been working a bit on budgetwarrior for features I needed for my budget. I've also been contacted with questions on my thor operating system and since that point I've doing some work on thor as well.

This new version of budgetwarrior has quite a few new features even though it's a minor version.

Note: The data from all the views is totally randomized and does not make sense ;)

Retirement Calculator

The biggest novelty in this version is the addition of a retirement calculator. This is still very basic, but it may give information on how close (or far) you are from early retirement. Here is what the view gives you:

Using your annual withdrawal rate and expected Rate Of Return, it can compute how many years you will need to reach your goals Financial Independence (FI). It will also gives you your FI ratio and a few more information about your savings rate, income, expenses and so on. It's nothing very fancy but it can be very useful.

New features

I've also added a few graphs based on the budget information. The first is the visualization of the expenses over time:

This can be pretty useful to see how are your expenses going. Even if your income is going, expenses should not necessarily go up (you should save more!).

Another new view can show your asset allocation over time and the current asset allocation of your entire net worth or specifically for your portfolio.

This is also really useful if you want to have a global view of your asset allocation into bonds, stocks and such.

There are also two other new minor features. You can now search expenses by name. This is really useful once you start having many expenses. Another new view is the Full aggregate view. Before, you could aggregate your expenses by month or year, now they can be aggregate since the beginning of the budget. With this, you can see how much you spend on coffee since you started keeping track of your budget. For me, it's a lot! Both these features are available both in command line and in the web interface.

Improvements

There are also a few improvements with this new version. You can now set a default account (in the configuration file with default_account=X). It will be set by default in both the web view and the console view. The rebalance view has been made more clear. I've added a second batch update view with only the assets that are being used (amount > 0). And lastly, the yearly overview is now displaying correctly the previous year savings rate.

Finally, there are also a few bug fixes. That is is the main reason I decided to release now. If you were using asset with different currency, several views where not correctly using the exchange rate to display them. Moreover, the average expenses in the monthly overview was not correct. Finally, if you were editing old expenses after having archived the accounts, it could be edited with the wrong account.

Installation

If you are on Gentoo, you can install it using layman:

layman -a wichtounet
emerge -a budgetwarrior

If you are on Arch Linux, you can use this AUR repository <https://github.com/StreakyCobra/aur> (wait a few days for the new version to be updated)_

For other systems, you'll have to install from sources:

git clone --recursive git://github.com/wichtounet/budgetwarrior.git
cd budgetwarrior
git checkout 1.0.1
make
sudo make install

If you want to test the server mode, the default username is admin and the default password is 1234. You can change them in the configuration file with web_user and web_password.

Conclusion

Although it's a minor version, it improves and fixes quite a few things, especially for the web view. I encourage you to try it out. Don't hesitate to let me a comment if you fail to use it or don't understand something ;)

There are still a few things that I want to do, as I said when I introduced the web version. The website still needs to be made faster. And the communication between the console and the server can also be improved.

If you are interested by the sources, you can download them on Github: budgetwarrior.

If you have a suggestion or you found a bug, please post an issue on Github.

If you have any comment, don't hesitate to contact me, either by letting a comment on this post or by email.

I got rid of Vivaldi browser for Google Chrome

Baptiste Wicht

2018-03-16 08:31

Comments

About a year ago, I switched from Firefox to Vivaldi. This week, I decided to get rid of Vivaldi and replaced with Google Chrome. In this post, I'm going to outline the reasons why I got rid of it.

Before, I switched to Vivaldi because Firefox was dropping support for XUL/XPCOM extensions and I was using Pentadactyl. In fact, Pentadactyl was the only reason I was using Firefox. It was slow and bloated and a bit unstable, but the extension was making it worth. Since they are dropping support for such extensions, I did not want to use Firefox anymore. So I switched to Vivaldi with Vimium. It's not as great as Firefox plus Pentadactyl. But it's a more customizable version of Google Chrome on which it's based.

But, in that year or so of using Vivaldi, I have had many issues. Some of them were not too bad and there was some workarounds. But they continued to pile up and they did not fix any of them so now, I decided it's too much.

Since the beginning, it always has been slow. It's not really bad, but still noticeable compared to Chrome. Especially opening Vivaldi is pretty bad. This is something I can live with, but they should really do something to make it faster.

The thing that I had the most issues with is multimedia. For instance Youtube (but all the other platforms have the same issues).

The first problem with media is to get a video in fullscreen. Most of the time, when I press the fullscreen button on Youtube, it grays out the screen and I have to press ESC. If I do that around five to ten times, it finally goes fullscreen. It may be because of my multi-monitor setup but Google Chrome has no issues whatsoever with that. It's pretty painful to do, but again I could live for since I don't use full screen a lot.

A second problem I had with media is they were running too fast. I'm not kidding, really too fast, not too slow. The media was running about twice too fast, you could see the seconds going fast on Youtube. I never seen this issue in any other tool, but it was happening at every start of Vivaldi. The fix was to restart Vivaldi every time this happened and the video played normally.

Another problem I had from the beginning is to make all HTML5 videos work. You have to download the binary plugins from Chrome to let Vivaldi play all HTML5 videos. It's not a big deal, but the problem is that they are overwritten after each update of Vivaldi. So you have to do it all the time.

A new media issue I had on the last update of Vivaldi is with Flash. At the beginning it was working even if it was outdated. I just had to confirm to run it with a warning. But, since the last update, I only had the warning that it was outdated. But I could not confirm to use it, the option was not here anymore. And it was still happening after I updated Flash... The only option to run Flash was to use a private navigation window...

And finally, I had another big issue with the last version of Vivaldi as well. The browser keeps crashing on my work computer. It can stay up a few minutes and then crash. The complete interface is not updated. I can still press the tabs and I can see the title of the window change, but the interface does not update. Again, it may come from my special window manager (I use awesome), but it's the only application not working...

With all these issues and especially the last two new problems, I decided it was time to cut the losses. So I reinstalled Google Chrome, transferred my plugins and everything worked like a charm. I still use Vimium to use vim bindings so my usage of the browser does not change. Of course, I don't have the customization that I had with Vivaldi. I would really really like to get rid of the address bar in the browser. I would also like to significantly reduce the size of the tab bar. But I prefer to live without these improvements than with so many bugs. I think Vivaldi is a good idea, but with a terrible implementation.

I also considered qutebrowser as an alternative. But for now it's still missing many features that I don't want to get rid of. So I will stay with Google Chrome for the time being.

What about you ? Do you have any experience with Vivaldi ?

Decrease DLL neural network compilation time with C++17

Baptiste Wicht

2018-02-07 11:39

Comments

Just last week, I've migrated my Expression Templates Library (ETL) library to C++17, it is now also done in my Deep Learning Library (DLL) library. In ETL, this resulted in a much nicer code overall, but no real improvement in compilation time.

The objective of the migration of DLL was two-fold. First, I also wanted to simplify some code, especially with if constexpr. But I also especially wanted to try to reduce the compilation time. In the past, I've already tried a few changes with C++17, with good results on the compilation of the entire test suite. While this is very good, this is not very representative of users of the library. Indeed, normally you'll have only one network in your source file not several. The new changes will especially help in the case of many networks, but less in the case of a single network per source file.

This time, I decided to test the compilation on the examples. I've tested the eight official examples from the DLL library:

mnist_dbn: A fully-connected Deep Belief Network (DBN) on the MNIST data set with three layers
char_cnn: A special CNN with embeddings and merge and group layers for text recognition
imagenet_cnn: A 12 layers Convolutional Neural Network (CNN) for Imagenet
mnist_ae: A simple two-layers auto-encoder for MNIST
mnist_cnn: A simple 6 layers CNN for MNIST
mnist_deep_ae: A deep auto-encoder for MNIST, only fully-connected
mnist_lstm: A Recurrent Neural Network (RNN) with Long Short Term Memory (LSTM) cells
mnist_mlp: A simple fully-connected network for MNIST, with dropout
mnist_rnn: A simple RNN with simple cells for MNIST

This is really representative of what users can do with the library and I think it's a much better for compilation time.

For reference, you can find the source code of all the examples online.

Results

Let's start with the results. I've tested this at different stages of the migration with clang 5 and GCC 7.2. I tested the following steps:

The original C++14 version
Simply compiling in c++17 mode (-std=c++17)
Using the C++17 version of the ETL library
Upgrading DLL to C++17 (without ETL)
ETL and DLL in C++17 versions

I've compiled each example independently in release_debug mode. Here are the results for G++ 7.2:

Example	0	1	2	3	4	5	6	7	8
C++14	37.818	32.944	33.511	15.403	29.998	16.911	24.745	18.974	19.006
-std=c++17	38.358	32.409	32.707	15.810	30.042	16.896	24.635	19.134	19.027
ETL C++17	36.045	31.000	30.942	15.322	28.840	16.747	24.151	18.208	18.939
DLL C++17	35.251	32.577	32.854	15.653	29.758	16.851	24.606	19.098	19.146
Final C++17	32.289	31.133	30.939	15.232	28.753	16.526	24.326	18.116	17.819
Final Improvement	14.62%	5.49%	7.67%	1.11%	4.15%	2.27%	1.69%	4.52%	6.24%

The difference by just enabling c++17 is not significant. On the other hand, some significant gain can be obtained by using the C++17 version of ETL, especially for the DBN version and for the CNN versions. Except for the DBN case, the migration of DLL to C++17 did not bring any significant advantage. When everything is combined, the gains are more important :) In the best case, the example is 14.6% faster to compile.

Let's see if it's the same with clang++ 5.0:

Example	0	1	2	3	4	5	6	7	8
C++14	40.690	34.753	35.488	16.146	31.926	17.708	29.806	19.207	20.858
-std=c++17	40.502	34.664	34.990	16.027	31.510	17.630	29.465	19.161	20.860
ETL C++17	37.386	33.008	33.896	15.519	30.269	16.995	28.897	18.383	19.809
DLL C++17	37.252	34.592	35.250	16.131	31.782	17.606	29.595	19.126	20.782
Final C++17	34.470	33.154	33.881	15.415	30.279	17.078	28.808	18.497	19.761
Final Improvement	15.28%	4.60%	4.52%	4.52%	5.15%	3.55%	3.34%	3.69%	5.25%

First of all, as I have seen time after time, clang is still slower than GCC. It's a not a big difference, but still significant. Overall, the gains are a bit higher on clang than on GCC, but not by much. Interestingly, the migration of DLL to C++17 is less interesting in terms of compilation time for clang. It seems even to slow down compilation on some examples. On the other hand, the migration of ETL is more important than on GCC.

Overall, every example is faster to compile using both libraries in C++17, but we don't have spectacular speed-ups. With clang, we have speedups from 3.3% to 15.3%. With GCC, we have speedup from 1.1% to 14.6%. It's not very high, but I'm already satisfied with these results.

C++17 in DLL

Overall, the migration of DLL to C++17 was quite similar to that of ETL. You can take a look at my previous article if you want more details on C++17 features I've used.

I've replaced a lot of SFINAE functions with if constexpr. I've also replaced a lot of statif_if with if constexpr. There was a large number of these in DLL's code. I also enabled all the constexpr that were commented for this exact time :)

I was also thinking that I could replace a lot of meta-programming stuff with fold expressions. While I was able to replace a few of them, most of them were harder to replace with fold expressions. Indeed, the variadic pack is often hidden behind another class and therefore the pack is not directly usable from the network class or the group and merge layers classes. I didn't want to start a big refactoring just to use a C++17 feature, the current state of this code is fine.

I made some use of structured bindings as well, but again not as much as I was thinking. In fact, a lot of time, I'm assigning the elements of a pair or tuple to existing variables not declaring new variables and unfortunately, you can only use structured bindings with auto declaration.

Overall, the code is significantly better now, but there was less impact than there was on ETL. It's also a smaller code base, so maybe this is normal and my expectations were too high ;)

Conclusion

The trunk of DLL is now a C++17 library :) I think this improve the quality of the code by a nice margin! Even though, there is still some work to be done to improve the code, especially for the DBN pretraining code, the quality is quite good now. Moreover, the switch to C++17 made the compilation of neural networks using the DLL library faster to compile, from 1.1% in the worst case to 15.3% in the best case! I don't know when I will release the next version of DLL, but it will take some time. I'll especially have to polish the RNN support and add a sequence to sequence loss before I will release the 1.1 version of DLL.

I'm quite satisfied with C++17 even if I would have liked a bit more features to play with! I'm already a big fan of if constexpr, this can make the code much nicer and fold expressions are much more intuitive than their previous recursive template counterpart.

I may also consider migrating some parts of the cpp-utils library, but if I do, it will only be through the use of conditionals in order not to break the other projects that are based on the library.

C++17 Migration of Expression Templates Library (ETL)

Baptiste Wicht

2018-02-02 14:03

Comments

I've finally decided to migrate my Expression Templates Library (ETL) project to C++17. I've talking about doing that for a long time and I've released several releases without doing the change, but the next version will be a C++17 library. The reason why I didn't want to rush the change was that this means the library needs a very recent compiler that may not be available to everybody. Indeed, after this change, the ETL library now needs at least GCC 7.1 or Clang 4.0.

I've already made some previous experiments in the past. For instance, by using if constexpr, I've managed to speed up compilation by 38% and I've also written an article about the fold expressions introduced in C++17. But I haven't migrated a full library yet. This is now done with ETL. In this article, I'll try to give some example of improvements by using C++17.

This will only cover the C++17 features I'm using in the updated ETL library, I won't cover all of the new C++17 features.

if constexpr

The most exciting new thing in C++17 for me is the if constexpr statement. This is a really really great thing. In essence, it's a normal if statement, but with one very important difference. The statement that is not taken (the else if the condition is true, or the if constexpr if the condition is false) is discarded. And what is interesting is what happens to discarded statements:

The body of a discarded statement does not participate in return type deduction.
The discarded statement is not instantiated
The discarded statement can odr-use a variable that is not defined

Personally, I'm especially interested by points 1 and 2. Let's start with an example where point 1 is useful. In ETL, I have a make_temporary function. This function either forwards an ETL container or creates a temporary container from an ETL expression. This is based on a compile-time traits. The return type of the function is the not the same in both cases. What you did in those case before C++17, is use SFINAE and make two functions:

template <typename E, cpp_enable_iff(is_dma<E>)>
decltype(auto) make_temporary(E&& expr) {
    return std::forward<E>(expr);
}

template <typename E, cpp_enable_iff(!is_dma<E>)>
decltype(auto) make_temporary(E&& expr) {
    return force_temporary(std::forward<E>(expr));
}

One version of the function will forward and the other version will force a temporary and the return type can be different since these are two different functions. This is not bad, but still requires two functions where you only want to write one. However, in C++17, we can do much better using if constexpr:

template <typename E>
decltype(auto) make_temporary(E&& expr) {
    if constexpr (is_dma<E>) {
        return std::forward<E>(expr);
    } else {
        return force_temporary(std::forward<E>(expr));
    }
}

I think this version is really superior to the previous one. We only have one function and the logic is much clearer!

Let's now see an advantage of the point 2. In ETL, there are two kinds of matrices, matrices with compile-time dimensions (fast matrices) and matrices with runtime dimensions (dynamic matrices). When they are used, for instance for a matrix-multiplication, I use static assertions for fast matrices and runtime assertions for dynamic matrices. Here is an example for the validation of the matrix-matrix multiplication:

template <typename C, cpp_disable_iff(all_fast<A, B, C>)>
static void check(const A& a, const B& b, const C& c) {
    static_assert(all_2d<A,B,C>, "Matrix multiplication needs matrices");
    cpp_assert(
        dim<1>(a) == dim<0>(b)         //interior dimensions
            && dim<0>(a) == dim<0>(c)  //exterior dimension 1
            && dim<1>(b) == dim<1>(c), //exterior dimension 2
        "Invalid sizes for multiplication");
    cpp_unused(a);
    cpp_unused(b);
    cpp_unused(c);
}

template <typename C, cpp_enable_iff(all_fast<A, B, C>)>
static void check(const A& a, const B& b, const C& c) {
    static_assert(all_2d<A,B,C>, "Matrix multiplication needs matrices");
    static_assert(
        dim<1, A>() == dim<0, B>()         //interior dimensions
            && dim<0, A>() == dim<0, C>()  //exterior dimension 1
            && dim<1, B>() == dim<1, C>(), //exterior dimension 2
        "Invalid sizes for multiplication");
    cpp_unused(a);
    cpp_unused(b);
    cpp_unused(c);
}

Again, we use SFINAE to distinguish the two different cases. In that case, we cannot use a normal if since the value of the dimensions cannot be taken at compile-time for dynamic matrices, more precisely, some templates cannot be instantiated for dynamic matrices. As for the cpp_unused, we have to use for the static version because we don't use them and for the dynamic version because they won't be used if the assertions are not enabled. Let's use if constexpr to avoid having two functions:

template <typename C>
static void check(const A& a, const B& b, const C& c) {
    static_assert(all_2d<A,B,C>, "Matrix multiplication needs matrices");

    if constexpr (all_fast<A, B, C>) {
        static_assert(dim<1, A>() == dim<0, B>()         //interior dimensions
                          && dim<0, A>() == dim<0, C>()  //exterior dimension 1
                          && dim<1, B>() == dim<1, C>(), //exterior dimension 2
                      "Invalid sizes for multiplication");
    } else {
        cpp_assert(dim<1>(a) == dim<0>(b)         //interior dimensions
                       && dim<0>(a) == dim<0>(c)  //exterior dimension 1
                       && dim<1>(b) == dim<1>(c), //exterior dimension 2
                   "Invalid sizes for multiplication");
    }

    cpp_unused(a);
    cpp_unused(b);
    cpp_unused(c);
}

Since the discarded won't be instantiated, we can now use a single function! We also avoid some duplications of the first static assertion of the unused statements. Pretty great, right ? But we can do better with C++17. Indeed, it added a nice new attribute [[maybe_unused]]. Let's see what this gives us:

template <typename C>
static void check([[maybe_unused]] const A& a, [[maybe_unused]] const B& b, [[maybe_unused]] const C& c) {
    static_assert(all_2d<A,B,C>, "Matrix multiplication needs matrices");

    if constexpr (all_fast<A, B, C>) {
        static_assert(dim<1, A>() == dim<0, B>()         //interior dimensions
                          && dim<0, A>() == dim<0, C>()  //exterior dimension 1
                          && dim<1, B>() == dim<1, C>(), //exterior dimension 2
                      "Invalid sizes for multiplication");
    } else {
        cpp_assert(dim<1>(a) == dim<0>(b)         //interior dimensions
                       && dim<0>(a) == dim<0>(c)  //exterior dimension 1
                       && dim<1>(b) == dim<1>(c), //exterior dimension 2
                   "Invalid sizes for multiplication");
    }
}

No more need for cpp_unused trick :) This attribute tells the compiler that a variable or parameter can be sometimes unused and therefore does not lead to a warning for it. Only one thing that is not great with this attribute is that it's too long, 16 characters. It almost double the width of my check function signature. Imagine if you have more parameters, you'll soon have to use several lines. I wish there was a way to set an attribute for all parameters together or a shortcut. I'm considering whether to use a short macro to use in place of it, but haven't yet decided.

Just a note, if you have else if statements, you need to set them as constexpr as well! This was a bit weird for me, but you can figure it as if the condition is constexpr, then the if (or else if) is constexpr as well.

Overall, I'm really satisfied with the new if constexpr! This really makes the code much nicer in many cases, especially if you abuse metaprogramming like I do.

You may remember that I've coded a version of static if in the past with C++14 in the past. This was able to solve point 2, but not point 1 and was much uglier. Now we have a good solution to it. I've replaced two of these in the current code with the new if constexpr.

Fold expressions

For me, fold expressions is the second major feature of C++17. I wont' go into too much details here, since I've already talked about fold expression in the past . But I'll show two examples of refactorings I've been able to do with this.

Here was the size() function of a static matrix in ETL before:

static constexpr size_t size() {
   return mul_all<Dims...>;
}

The Dims parameter pack from the declaration of fast_matrix:

template <typename T, typename ST, order SO, size_t... Dims>
struct fast_matrix_impl;

And the mul_all is a simple helper that multiplies each value of the variadic parameter pack:

template <size_t F, size_t... Dims>
struct mul_all_impl final : std::integral_constant<size_t, F * mul_all_impl<Dims...>::value> {};

template <size_t F>
struct mul_all_impl<F> final : std::integral_constant<size_t, F> {};

template <size_t F, size_t... Dims>
constexpr size_t mul_all = mul_all_impl<F, Dims...>::value;

Before C++17, the only way to compute this result at compilation time was to use template recursion, either with types or with constexpr functions. I think this is pretty heavy only for doing a multiplication sum. Now, with fold expressions, we can manipulate the parameter pack directly and rewrite our size function:

static constexpr size_t size() {
    return (Dims * ...);
}

This is much better! This clearly states that each value of the parameter should be multiplied together. For instance 1,2,3 will become (1*2)*3.

Another place where I was using this was to code a traits that tests if a set of boolean are all true at compilation-time:

template <bool... B>
constexpr bool and_v = std::is_same<
    cpp::tmp_detail::bool_list<true, B...>,
    cpp::tmp_detail::bool_list<B..., true>>::value;

I was using a nice trick here to test if all booleans are true. I don't remember where I picked it up, but it's quite nice and very fast to compile.

This was used for instance to test that a set of expressions are all single-precision floating points:

template <typename... E>
constexpr bool all_single_precision = and_v<(is_single_precision<E>)...>;

Now, we can get rid of the and_v traits and use directly the parameter pack directly:

template <typename... E>
constexpr bool all_single_precision = (is_single_precision<E> && ...);

I think using fold expressions results in much clearer syntax and better code and it's a pretty nice feature overall :)

As a note here, I'd like to mention, that you can also use this syntax to call a function on each argument that you have, which makes for much nicer syntax as well and I'll be using that in DLL once I migrate it to C++17.

Miscellaneous

There are also a few more C++17 features that I've used to improve ETL, but that have a bit less impact.

A very nice feature of C++17 is the support for structured bindings. Often you end up with a function that returns several parts of information in the form of a pair or a tuple or even a fixed-size array. You can use an object for this, but if you don't, you end up with code that is not terribly nice:

size_t index;
bool result;
float alpha;
std::tie(index, result, alpha) = my_function();

It's not terribly bad, but in these cases, you should be be hoping for something better. With c++17, you can do better:

auto [index, result, alpha] = my_function();

Now you can directly use auto to deduce the types of the three variables at once and you can get all the results in the variables at once as well :) I think this is really nice and can really profit some projects. In ETL, I've almost no use for this, but I'm going to be using that a bit more in DLL.

Something really nice to clean up the code in C++17 is the ability to declared nested namespaces in one line. Before, you have a nested namespace etl::impl::standard for instance, you would do:

namespace etl {
namespace impl {
namespace standard {

// Someting inside etl::impl::standard

} // end of namespace standard
} // end of namespace impl
} // end of namespace etl

In C++17, you can do:

namespace etl::impl::standard {

// Someting inside etl::impl::standard

} // end of namespace etl::impl::standard

I think it's pretty neat :)

Another very small change is the ability to use the typename keyword in place of the class keyword when declaring template template parameters. Before, you had to declare:

template <template <typename> class X>

now you can also use:

template <template <typename> typename X>

It's just some syntactic sugar, but I think it's quite nice.

The last improvement that I want to talk about is one that probably very few know about but it's pretty neat. Since C++11, you can use the alignas(X) specifier for types and objects to specify on how many bytes you want to align these. This is pretty nice if you want to align on the stack. However, this won't always work for dynamic memory allocation. Imagine this struct:

struct alignas(128)  test_struct  { char data; };

If you declare an object of this type on the stack, you have the guarantee that it will be aligned on 128 bytes. However, if you use new to allocate it on the heap, you don't have such guarantee. Indeed, the problem is that 128 is greater than the maximum default alignment. This is called an over-aligned type. In such cases, the result will be aligned on the max alignment of your system. Since C++17, new supports aligned dynamic memory allocation of over-aligned types. Therefore, you can use a simple alignas to allocate dynamic over-aligned types :) I need this in ETL for matrices that need to be aligned for vectorized code. Before, I was using a larger array with some padding in order to find an aligned element inside, but that is not very nice, now the code is much better.

Compilation Time

I've done a few tests to see how much impact these news features have on compilation time. Here, I'm doing benchmark on compiling the entire test suite in different compilation mode, I enabled most compilation options (all GPU and BLAS options in order to make sure almost all of the library is compiled).

Since I'm a bit short on time before going to vacation, I've only gathered the results with g++. Here are the results with G++ 7.2.0

	debug	release	release_debug
C++14	862s	1961s	1718s
C++17	892s	2018s	1745s
Difference	+3.4%	+2.9%	+1.5%

Overall, I'm a bit disappointed by these results, it's around 3% slower to compile the C++17 version than the C++14 version. I was thinking that this would a least be as fast to compile as before. It seems that currently with G++ 7.2, if constexpr are slower to compile than the equivalent SFINAE functions. I didn't do individual benchmarks of all the features I've migrated, therefore, it may not be coming from if constexpr, but since it's the greatest change by far, it's the more likely candidate. Once I'll have a little more time, after my vacations, I'll try to see if that is also the case with clang.

Keep in mind that we are compiling the test suite here. The ETL test suite is using the manual selection mode of the library in order to be able to test all the possible implementations of each operation. This makes a considerable difference in performance. I expect better compilation time when this is used in automatic selection mode (the default mode). In the default mode, a lot more code can be disabled with if constexpr. I will test this next with the DLL library which I will also migrate to C++17.

Conclusion

This concludes this report on the migration of my ETL library from C++14 to C++17. Overall, I'm really satisfied with the improvement of the code, it's much better. I'm a bit disappointed by the slight increase (around 3%) in compilation time, but it's not dramatic either. I'm still hoping that once it's used in DLL, I will see a decrease in compilation, but we'll see that when I'll be done with the migration of DLL to C++17 which may take some time since I'll have two weeks vacation in China starting Friday.

The new version is available only through the master branch. It will be released as the 1.3 version probably when I integrate some new features, but in itself will not be released as new version. You can take a look in the Github etl repository if you are interested.