Google Plus Logo
Twitter Logo
Facebook Logo

Blazing fast unit test compilation with doctest 1.1

You may remember my quest for faster compilation times. I had made several changes to the Catch test framework macros in order to save some compilation at the expense of my test code looking a bit less nice:

REQUIRE(a == 9); //Before
REQUIRE_EQUALS(a, 9); //After

The first line is a little bit better, but using several optimizations, I was able to dramatically change the compilation time of the test cases of ETL. In the end, I don't think that the difference between the two lines justifies the high overhead in compilation times.


doctest is a framework quite similar to Catch but that claims to be much lighter. I tested doctest 1.0 early on, but at this point it was actually slower than Catch and especially slower than my versions of the macro.

Today, doctest 1.1 was released with promises of being even lighter than before and providing several new ways of speeding up compilation. If you want the results directly, you can take a look at the next section.

First of all, this new version improved the basic macros to make expression decomposition faster. When you use the standard REQUIRE macro, the expression is composed by using several template techniques and operator overloading. This is really slow to compile. By removing the need for this decomposition, the fast Catch macros are much faster to compile.

Moreover, doctest 1.1 also introduces CHECK_EQ that does not any expression decomposition. This is close to what I did in my macros expect that it is directly integrated into the framework and preserves all its features. It is also possible to bypass the expression checking code by using FAST_CHECK_EQ macro. In that case, the exceptions are not captured. Finally, a new configuration option is introduced (DOCTEST_CONFIG_SUPER_FAST_ASSERTS) that removes some features related to automatic debugger breaks. Since I don't use the debugger features and I don't need to capture exception everywhere (it's sufficient for me that the test fails completely if an exception is thrown), I'm more than eager to use these new features.


For evaluation, I have compiled the complete test suite of ETL, with 1 thread, using gcc 4.9.3 with various different options, starting from Catch to doctest 1.1 with all compilation time features. Here are the results, in seconds:

Version Time VS Catch VS Fast Catch VS doctest 1.0
Catch 724.22      
Fast Catch 464.52 -36%    
doctest 1.0 871.54 +20% +87%  
doctest 1.1 614.67 -16% +32% -30%
REQUIRE_EQ 493.97 -32% +6% -43%
FAST_REQUIRE_EQ 439.09 -39% -6% -50%
SUPER_FAST_ASSERTS 411.11 -43% -12% -53%

As you can see, doctest 1.1 is much faster to compile than doctest 1.0! This is really great news. Moreover, it is already 16% faster than Catch. When all the features are used, doctest is 12% faster than my stripped down versions of Catch macros (and 43% faster than Catch standard macros). This is really cool! It means that I don't have to do any change in the code (no need to strip macros myself) and I can gain a lot of compilation time compared to the bare Catch framework.

I really think the author of doctest did a great job with the new version. Although this was not of as much interest for me, there are also a lot of other changes in the new version. You can consult the changelog if you want more information.


Overall, doctest 1.1 is much faster to compile than doctest 1.0. Moreover, it offers very fast macros for test assertions that are much faster to compile than Catch versions and even faster than the versions I created myself to reduce compilation time. I really thing this is a great advance for doctest. When compiling with all the optimizations, doctest 1.1 saves me 50 seconds in compilation time compared to the fast version of Catch macro and more than 5 minutes compared to the standard version of Catch macros.

I'll probably start using doctest on my development machine. For now, I'll keep Catch as well since I need it to generate the unit test reports in XML format for Sonarqube. Once this feature appears in doctest, I'll probably drop Catch from ETL and DLL

If you need blazing fast compilation times for your unit tests, doctest 1.1 is probably the way to go.


Short review of Bullseye Coverage

Bullseye is a commercial Code Coverage analyzer. It is fully-featured with an export to HTML, to XML and even a specific GUI to see the application.It costs about 800$, with a renewal fee of about 200$ per year.

I'm currently using gcov and passing the results to Sonar. This works well, but there are several problems. First, I need to use gcovr to generate the XML file, that means two tools. Then, gcov has no way to merge coverage reports. In my tests of ETL, I have seven different profiles being tested and I need the overall coverage report. lcov has a merge feature but it is slow as hell (it takes longer to merge the coverage files than to compile and run the complete test suite seven times...). For now, I'm using a C++ program that I wrote to combine the XML files or a Python script that does that, but neither are perfect and it needs maintenance. Finally, it's impossible to exclude some code from the coverage report (there is code that isn't meant to be executed (exceptional code)). For now, I'm using yet another C++ program that I wrote to do this from comments in code.

Bullseye does have all these feature, so I got an evaluation license online and tried this tool and wrote a short review of it.


The usage is pretty simple. You put the coverage executables in your PATH variable and activate coverage globally. Then, we you compile, the compiler calls will be intercepted and a coverage file will be generated. When the compilation is done, run the program and the coverage measurements will be filled.

The coverage results can then be exported to HTML (or XML) or visualized using the CoverageBrowser tool:

Screenshot of Bullseye main coverage view

The main view of the Bullseye tool code coverage results

It's a pretty good view of the coverage result. You have a breakdown by folders, by file, by function and finally by condition. You can view directly the source code:

Screenshot of Bullseye source code coverage view

The source view of the Bullseye tool code coverage results

If you want to exclude some code from your coverage reports, you can use a pragma:

switch (n) {
    case 1: one++; break;
    case 2: two++; break;
    case 3: three++; break;
    #pragma BullseyeCoverage off
    default: abort();
    #pragma BullseyeCoverage on

So that the condition won't be set as uncovered.

As for the coverage, it's pretty straightforward. For example:

covmerge -c -ffinal.cov sse.cov avx.cov

and it's really fast. Unfortunately, the merging is only done at the function level, not at the statement or at the condition level. This is a bit disappointing, especially from a commercial tool. Nevertheless, it works well.


To conclude, Bullseye seems to be a pretty good tool. It has more features than standard gcov coverage and all features are well integrated together. I have only covered the features I was interested in, there are plenty of other things you can look at on the official website.

However, if you don't need the extra features such as the visualizer (or use something like Sonar for this), or the merge or code excluding, it's probably not worth paying the price for it. In my case, since the merge is not better than my C++ tool (both do almost the same and my tool does some basic line coverage merging as well) and I don't need the visualizer, I won't pay the price for it. Moreover, they don't have student or open source licensing, therefore, I'll continue with my complicated toolchain :)


Expression Templates Library (ETL) 1.0

I've just released the first official version of my Expression Templates Library (ETL for short): The version 1.0.

Until now, I was using a simple rolling release model, but I think it's now time to switch to some basic versioning. The project is now at a stable state.

ETL 1.0 has the following main features:

  • Smart Expression Templates
  • Matrix and vector (runtime-sized and compile-time-sized)
  • Simple element-wise operations
  • Reductions (sum, mean, max, ...)
  • Unary operations (sigmoid, log, exp, abs, ...)
  • Matrix multiplication
  • Convolution (1D and 2D and higher variations)
  • Max Pooling
  • Fast Fourrier Transform
  • Use of SSE/AVX to speed up operations
  • Use of BLAS/MKL/CUBLAS/CUFFT/CUDNN libraries to speed up operations
  • Symmetric matrix adapter (experimental)
  • Sparse matrix (experimental)


Here is an example of expressions in ETL:

etl::fast_matrix<float, 2, 2, 2> a = {1.1, 2.0, 5.0, 1.0, 1.1, 2.0, 5.0, 1.0};
etl::fast_matrix<float, 2, 2, 2> b = {2.5, -3.0, 4.0, 1.0, 2.5, -3.0, 4.0, 1.0};
etl::fast_matrix<float, 2, 2, 2> c = {2.2, 3.0, 3.5, 1.0, 2.2, 3.0, 3.5, 1.0};

etl::fast_matrix<float, 2, 2, 2> d(2.5 * ((a >> b) / (log(a) >> abs(c))) / (1.5 * scale(a, sign(b)) / c) + 2.111 / log(c));

Or another I'm using in my neural networks library:

h = etl::sigmoid(b + v * w)

In that case, the vector-matrix multiplication will be executed using a BLAS kernel (if ETL is configured correclty) and the assignment, the sigmoid and the addition will be automatically vectorized to use either AVX or SSE depending on the machine.

Or with a convolutional layer and a ReLU activation function:

etl::reshape<1, K, NH1, NH2>(h_a) = etl::conv_4d_valid_flipped(etl::reshape<1, NC, NV1, NV2>(v_a), w);
h = max(b_rep + h_a, 0.0);

This will automatically be computed either with NVIDIA CUDNN (if available) or with optimized SSE/AVX kernels.

For more information, you can take a look at the Reference on the wiki.

Next version

For the next version, I'll focus on several things:

  • Improve matrix-matrix multiplication kernels when BLAS is not available. There is a lot of room for improvement here
  • Complete support for symmetric matrices (currently experimental)
  • Maybe some new adapters such as Hermitian matrices
  • GPU improvements for some operations that can be done entirely on GPU
  • New convolution performanceimprovements
  • Perhaps more complete parallel support for some implementations
  • Drop some compiler support to use full C++14 support

Download ETL

You can download ETL on Github. If you only interested in the 1.0 version, you can look at the Releases pages or clone the tag 1.0. There are several branches:

  • master Is the eternal development branch, may not always be stable
  • stable Is a branch always pointing to the last tag, no development here

For the future release, there always will tags pointing to the corresponding commits. I'm not following the git flow way, I'd rather try to have a more linear history with one eternal development branch, rather than an useless develop branch or a load of other branches for releases.

Don't hesitate to comment this post if you have any comment on this library or any question. You can also open an Issue on Github if you have a problem using this library or propose a Pull Request if you have any contribution you'd like to make to the library.

Hope this may be useful to some of you :)


Asgard: Home Automation project

I have updated my asgard project to make it finally useful for me, so I figured I'd present the project now.

Asgard is my project of home automation based on a Raspberry Pi. I started this project after Ninja Blocks kickstarter company went down and I was left with useless sensors. So I figured why not have fun creating my own :P I know there are some other projects out there that are pretty good, but I wanted to do some more low level stuff for once, so what the hell.

Of course, everything is written in C++, no surprise here. The project is built upon a server / drivers architecture. The drivers and the server are talking via network sockets, so they can be on different machines. The server is displaying the data it got on a web interface and also provide a way to trigger actions of drivers either from the web interface or through the integrated rules engine. The data are stored in a database, accessed with CPPSqlite3 (probably going to be replaced by sqlcpp11) and the web server is handled with mongoose (with a c++ interface).

I must mention that most of the web part of the project was made by a student of mine, Stéphane Ly, who work on it as part of his study.

Here is a picture of the Raspberry Pi system (not very pretty ;) ):


I plan to try to fit at least some of it on a nicer box with nicer cables and such. Moreover, I also plan to add real antennas to the RF transmitter and receiver, but I haven't received them so far.


asgard support several sensors:

  • DHT11 Temperature/Humdity Sensor
  • WT450 Temperature/Humdity Sensor
  • RF Button
  • IR Remote
  • CPU Temperature Sensor

You can see the sensors data displayed on the web interface:



There are currently a few actions provided by the drivers:

  • Wake-On-Lan a computer by its MAC Address
  • ITT-1500 smart plugs ON and OFF
  • Kodi actions: Pause / Play / Next / Previous on Kodi

Here are the rules engine:


My home automation

I'm currently using this system to monitor the temperature in my appartment. Nothing great so far because I don't have enough sensors yet. And now, I'm also using a wireless button to turn on my power socket, wait 2 seconds and then power on my Kodi Home Theater with wake on lan.

It's nothing fancy so far, but it's already better than what I had with Ninja Blocks, except for the ugly hardware ;).


There are still tons of work on the project and on integration in my home.

  • I'm really dissatisfied with the WT450 sensor, I've ordered new Oregon sensors to try to do better.
  • I've ordered a few new sensors: Door intrusion detector and motion detector
  • The rules system needs to be improve to support multiple conditions
  • I plan to add a simple state system to the asgard server
  • There are a lot of refactorings necessary in the code and

However, I don't know when I'll work on this again, my work on this project is pretty episodic to say the least.


The code is, as always, available on Github. There are multiple repositories: all asgard repositories. It's not that much code for now, about 2000 lines of code, but some of it may be useful. If you plan to use the system, keep in mind that it was never tested out of my environment and that there is no documentation so far, but don't hesitate to open Issues on Github if you have questions or post a comment here.


Update: Thor, Thesis and Publications

Since it's been a real while since the last post I've written here, I wanted to write a short status update.

I had to serve one month in the army, which does not help at all for productivity :P Since the update to Boost Spirit X3, I haven't worked on my eddic compiler again, but I've switched back to my operating system project: thor. I'm having a lot of fun with it again and it's in much better state than before.

We also have been very productive on the publication side, with four new publications this year in various conferences. I'll update the blog when the proceedings are published. I'll be going to ICANN 2016 and ANNPR 2016 next week and probably to ICFHR in October. And of course, I'll go back to Meeting C++ in November :) As for my thesis, it's finally going great, I've started writing regularly and it's taking form!


My project Thor Operating System now has much more features than before:

  • 64bit operating system
  • Preemptive Multiprocessing
  • Keyboard / Mouse driver
  • Full ACPI support with ACPICA
  • Read/Write ATA driver
  • FAT32 file system support
  • HPET/RTC/PIT drivers
  • Basic PCI support
  • Multi stage booting with FAT32

Since last time, I've fixed tons of bug in the system. Although there are still some culprit, it's much more stable than before. They were a lot of bugs in the scheduler with loads of race conditions. I hope I've working through most of them now.

I'm currently working on the network stack. I'm able to receive and send packets using the Realtek 8139 card. I have working support for Ethernet, IP and ARP. I'm currently working on adding ICMP support. I've come to realize that the hardest part is not to develop the code here but to find a way to test it. Network in Qemu is a huge pain in the ass to configure. And then, you need tools to generate some packets or at least answer to packets send by the virtual machine, and it's really bad... Nevertheless, it's pretty fun overall :)

Aside from this, I'm also working on a window manager. I'll try to post an update on this.

You can take a look at the thor sources if you're interested.


For the time being, I'll focus my effort on the thor project. I also have some development to do on my home automation system: asgard-server that I plan to finalize and deploy in a useful way this weekend in my apartment. You can also expect some updates on my deep learning library where I've started work to make it more user-friendly (kind of). I'm also still waiting on the first stable version of doctest for a new comparison with Catch.

I really want to try to publish again some more posts on the blog. I'll especially try to publish some more updates about Thor.


eddic 1.2.4: New Boost Spirit X3 parser and minor cleanups

After almost 2 years, the new version of eddic (the compiler of the EDDI programming language) is out! eddic 1.2.4

I haven't worked a lot on this project in the last years, I have been busy with my Ph.D. related projects (ETL and DLL), my operating systems, cpm, ... I've mostly worked on the parser to test the new version of Boost Spirit: X3. This will be described on the next section, with the other changes in the later section.

New Boost Spirit X3

Boost Spirit X3 is a completely revamped version of Boost Spirit X3. It's aimed at performance, both at compile-time and at runtime and uses recent features of modern C++. It's not compatible with Boost Spirit Qi, so you'll most likely have to rewrite a lot of stuff, in the parser, in the Abstract Syntax Tree (AST) and in the AST passes as well.

For reference, I'm using the Boost 1.59 version.


Let's start with the pros.

First, the runtime performance is definitely better. Parsing all my eddi test cases and samples, takes 42% less time than with the previous parser. It is important to know that the old parser was very optimized, with moves instead of copies and with a static lexer. You can take a look at this post to see what was necessary to optimize the old Qi. I think it's a good result since the new grammar does not use a lexer (x3 does not support it) and does not need these optimizations. This improvement really was my objective. I'll try to push it farther in the future.

Compile-time performance is also much better. It takes 3 times less time to compile the new parser (1 minute to around 20 seconds). Moreover, the new parser is now in only one file, rather than being it necessary to split it all over the place for compile-time performance. Even though it's not really important for me, it's still good to have :)

Especially due to the performance point, I've been able to remove some code, the lexer, the generated static lexer and the special pointers optimizations of the AST.


Unfortunately, there are some disadvantages of using the new Spirit X3.

First, the AST needs to be changed. For good parsing performance, you need to use x3::variant and x3::forward_ast. This is a major pain in the ass since x3::variant is much less practical to use than boost::variant. Almost everything is explicit, meaning uglier code than before, in my opinion. Moreover, you need to work around x3::forward_ast for boost::get, whereas boost::recursive_wrapper was working better in that matter. I've had to create my own wrapper around boost::get in order to be able to use the new tree. In my opinion, this is clearly a regression.

Secondly, although X3 was also meant to remove the need to use some hacks in the grammar, I ended up having more hacks than before. For instance, many AST node have a fake field in order to make X3 happy. I've still had to use the horrible eps hack at one place. I've had to create a few more rules in order to fix type deduction that is working differently than before (worse for me). And for some reasons, I had to replace some expectations from the grammar to make it parse correctly. This is a really important regression in my opinion, since it may make the parsing slower and will make the error message less nice.

The previous error handling system allowed me to track the file from which an AST node was parsed from. Although the new error handler is a lot nicer than the old system, it does not have this feature, so I had to work around this by using new annotation nodes and a new global handler. Overall, it's probably a bit worse than before, but makes for lighter AST nodes.

Finally, for some reason, I haven't been able to use the debug option of the library (lots of compile time errors). That complicated a bit the debugging of the parser.

Spirit X3 or Spirit Qi ?

Overall, I have to say I'm a bit disappointed by Spirit X3. Even though it's faster at runtime and faster to compile, I was really expecting less issues with it. What I really did not like was all the changes I had to make because of x3::variant and x3::forward_ast. Overall, I really don't think it was worth the trouble porting my parser to Spirit X3.

If you have a new project, I would still consider using Boost Spirit X3.

If you have an existing parser, I would probably not advice porting it to X3. Unless you really have issues with parsing performances (and especially if you have not already optimized QI parser), it's probably not worth the trouble and all the time necessary for all the changes.

Other changes

The other changes are much more minor. First of all, I've gotten rid of CMake. This project has really made me hate CMake. I have actually gotten rid of it on all my projects. I'm now using plain Makefiles and having a much better time with them. I've also replaced boost Program Options with cxxopts. It's a much more modern approach for program options parsing. Moreover, it's much more lightweight and it's header only. Only advantages. There also have been lots of changes to code (still not very good quality though).


eddic was my first real project in C++ and this can be seen in the code and the organization. The quality of the code is really bad now that I read it again. Some things are actually terrible :P It's probably normal since I was a beginner in C++ at the time.

For the future version of the compiler, I want to clean the code a lot more and focus on the EDDI language adding new features. Moreover, I'll also get rid of Boost Test Framework by using Catch (or doctest if it is ready).

As for now, I'm not sure on which project I'm going to focus. Either I'll continue working on the compiler or I'll start working again on my operating system (thor-os) in which I was working on process concurrency (without too much success :P). I'll probably post next updates on this post in the coming months.


You can find the EDDI Compiler sources on the Github repository

The version is available in the v1.2.4 tag available in the GitHub repository, in the releases pages or directly in the <em>master</em> branch.


Reduce Catch tests compilation time by another 16%

No, it's not the same post as two days! I've been able to reduce the compilation time of my test cases by another 16%!

Two days ago, I posted an article about how I reduced the compilation time of my tests by 13%, by bypassing the expression deduction from Catch. I came up with the macro REQUIRE_EQUALS:

template<typename L, typename R>
void evaluate_result(Catch::ResultBuilder&& __result, L lhs, R rhs){
    __result.setResultType(lhs == rhs);

#define REQUIRE_EQUALS(lhs, rhs) \
    evaluate_result(Catch::ResultBuilder( "REQUIRE", CATCH_INTERNAL_LINEINFO, #lhs " == " #rhs, Catch::ResultDisposition::Normal ), lhs, rhs);

This has the advantage that the left and right hand sides are directly set, not deduced with templates and operator overloading. This still has exactly the same features has the original macro, but it is a bit less nice in the test code. I was quite happy with that optimization, but it turned out, I was not aggressive enough in my optimizations.

Even though it seems simple, the macro is still bloated. There are two constructors calls: ResultBuilder and SourceLineInfo (hidden behind CATCH_INTERNAL_LINEINFO). That means that if you test case has 100 assertions, 200 constructor calls will need to be processed by the compiler. Considering that I have some test files with around 400 assertions, this is a lot of overhead for nothing. Moreover, two parameters have always the same value, no need to repeat them every time.

Simplifying the macro to the minimum led me to this:

template<typename L, typename R>
void evaluate_result(const char* file, std::size_t line, const char* exp, L lhs, R rhs){
    Catch::ResultBuilder result("REQUIRE", {file, line}, exp, Catch::ResultDisposition::Flags::Normal);
    result.setResultType(lhs == rhs);

#define REQUIRE_EQUALS(lhs, rhs) \
    evaluate_result(__FILE__, __LINE__, #lhs " == " #rhs, lhs, rhs);

The macro is now a simple function call. Even though the function is a template function, it will only be compiled for a few types (double and float in my case), whereas the code of the macro would be unconditionally compiled for each invocation.

With this new macro and function, the compilation time went down from 664 seconds to 554 seconds! This is more than 16% reduction in compilation time. When comparing against the original compilation time (without both optimizations) of 764 seconds, this is a 27% reduction! And there are absolutely no difference in features.

This is a really great result, in my opinion. I don't think this can be cut down more. However, there is still some room for optimization regarding the includes that Catch need. Indeed, it is very bloated as well. A new test framework, doctest follows the same philosophy, but has much smaller include overhead. Once all the necessary features are in doctest, I may consider adapting my macros for it and using it in place of Catch is there is some substantial reduction in compilation time.

If you want to take a look at the code, you can find the adapted code on Github.


Speed up compilation by 13% by simplifying Catch unit tests

In the previous two days, I've working on improving compilation time of my project Expression Templates Library (ETL). I have been able to reduce the compilation time of the complete test suite from 794 seconds to 764 seconds (using only one thread). Trying to get further, I started checking what was taking the most time in a test case when I saw that the REQUIRE calls of the test library were taking a large portion of the compilation time!

I have been using Catch as my test framework for more than two years and it's really been great overall. It is a great tool, header-only, fully-featured, XML reporting for Sonar, ... It really has everything I need from a test framework.

Contrary to some popular test frameworks that provides ASSERT_EQUALS, ASSERT_GREATER and all fashion of assert macros, Catch only provides one version: REQUIRE. For instance:

REQUIRE(x == 1.0);
REQUIRE(y < 5.5);
REQUIRE((z + x) != 22.01f);

The left and right part are detected with some smart template and operator overloading techniques and this makes for very nice test output in case of errors, for instance:

test/src/dyn_matrix.cpp:16: FAILED:
  REQUIRE( test_matrix.rows() == 2UL )
with expansion:
  3 == 2

I think this is pretty nice and the tests are really clear. However, it comes with a cost and I underestimated this at first.

To overcome this, I create two new macros (and few other variations) REQUIRE_EQUALS and REQUIRE_DIRECT that simply bypass Catch deduction of the expression:

inline void evaluate_result_direct(Catch::ResultBuilder&& __result, bool value){
    __result.setLhs(value ? "true" : "false");

template<typename L, typename R>
void evaluate_result(Catch::ResultBuilder&& __result, L lhs, R rhs){
    __result.setResultType(lhs == rhs);

#define REQUIRE_DIRECT(value) \
    evaluate_result_direct(Catch::ResultBuilder( "REQUIRE", CATCH_INTERNAL_LINEINFO, #value, Catch::ResultDisposition::Normal ), value);

#define REQUIRE_EQUALS(lhs, rhs) \
    evaluate_result(Catch::ResultBuilder( "REQUIRE", CATCH_INTERNAL_LINEINFO, #lhs " == " #rhs, Catch::ResultDisposition::Normal ), lhs, rhs);

There is really nothing too special about it, I simply followed the macros and functions in Catch source code until I found out what to bypass.

And now, we use them directly:


This is a bit less nice and it requires to know a few more macros, I admit, but it turns out to be much faster (and who really cares about the beauty of test code anyway...). Indeed, the total compilation time of the tests went from 764 seconds to 664 seconds! This is a 13% reduction of the compilation time! I really am impressed of the overhead of this technique. I cannot justify this slowdown just for a bit nicer test code. Finally, the output in case of error remains exactly the same as before.

This proves that sometimes the bottlenecks are not where we expect them :)

If you are interested, you can find the adapted code on Github.


Simplify Deep Learning Library usage on Linux and Windows!

No, I'm not dead ;) I've been very busy with my Ph.D (and playing Path of Exile, let's be honest...) and haven't had time to write something here in a long time.

Until now, there was too way to use my Deep Learning Library (DLL) project:

  1. Write a C++ program that uses the library
  2. Install DLL and write a configuration file to define your network and the problem to solve

The first version gives you all the features of the tool and allows you to build exactly what you need. The second version is a bit more limited, but does not require any C++ knowledge. However, it still does require a recent C++ compiler and build system.

Due to the high C++ requirements that are not met by Visual Studio and the fact that I don't work on Windows, this platform is not supported by the tool. Until now!

I've added a third option to use DLL in the form of a Docker image to make the second option even easier and allow the use of DLL on Windows. All you need is Docker, which is available on Linux, Mac and Windows. This is still limited to the second option in that you need to write a configuration describing the network, but you need to build DLL and don't need to install all its dependencies.


To install the image, you can simply use docker pull:

docker pull wichtounet/docker-dll

Then, to run it, you have to create a folder containing a dll.conf file and mount in the container at /dll/data/. There are some examples in the image repository. For instance, on Linux from the cloned repository:

docker run -v ${pwd}/rbm_mnist/:/dll/data/ wichtounet/docker-dll

or on Windows:

docker run -v /c/Users/Baptiste/rbm_mnist/:/dll/data wichtounet/docker-dll

This will automatically run the actions specified in the configuration file and train your network.


I would really have thought this would be harder, but it turned out that Docker is a very good solution to deploy multiplatform demo tools :)

As of now, there is only support for mnist data format in the tool in this form, but I plan to add basic CSV support as well in the near future.

I hope that this will help people willing to try the library with a simpler usage.


Use templight and Templar to debug C++ templates

C++ has some very good tools to debug, profile and analyze source files and executables. This all works well for standard runtime program. But, when you are using templates, you sometimes want these tools to act at compile-time. And at this point the support is much more scarce.

templight and Templar and two tools that are trying to fix this issue.

From the templight site:

Templight is a Clang-based tool to profile the time and memory consumption of template instantiations and to perform interactive debugging sessions to gain introspection into the template instantiation process.

and Templar is a visualization tool for the traces generated by templight.


Unfortunately, the templight installation is not user-friendly at all. You need to clone the complete LLVM/Clang tree and add templight inside it before compiling the complete clang toolchain. But that is the case for all clang-based tools... You also need to patch clang but that may not be necessary in the future. The complete instructions are available here.

The installation of Templar is much more convenient:

git clone
git checkout feature/templight2
cd Templar
qmake .
sudo make install

The branch feature/templight2 has much more features than the master and should support both Qt4 and Qt5, but I have only tested it on Qt4.


Let's use the class Fibonacci function as an example:

#include <iostream>

template <std::size_t N>
struct Fibonacci {
    static constexpr const std::size_t value = Fibonacci<N-1>::value + Fibonacci<N-2>::value;

template <>
struct Fibonacci<1> {
    static constexpr const std::size_t value = 1;

template <>
struct Fibonacci<0> {
    static constexpr const std::size_t value = 0;

int main(){
    std::cout << "Fibonacci<5>:" << Fibonacci<5>::value << std::endl;

Nothing fancy here, we're simply printing the fifth Fibonacci number on the console.

You can compile it with templight++:

templight++ -Xtemplight -profiler -Xtemplight -memory -Xtemplight -ignore-system -std=c++14 main.cpp

All the templight options starts with -Xtemplight and then you can use any clang++ options. This will generate a a.memory.trace.pbf file in the current directory. You can then run Templar. use File > Open Trace to open the trace file. This should open a window of this sort:


The top-left panel contains the source code of the application, automatically refreshed whenever you move in the template tree. In the top right, there is the template instantiation graph. In the bottom left, you'll see a list of list of files to be able to filter them and in the bottom right, you'll see the list of templates events. You can sort the list of template events by duration which is really convenient. You can then select Fibonacci<5> by double clicking it in the list (once sorted, it should be near the top). This should give you a tree looking something like that:


The edgy nodes are template instantiations and the round nodes are template memoization. We can directly see that each instantiation was only done once. I think this graph view is really helpful if you need to debug computation done at compile time. You can see that that not all nodes are displayed, this is because there is a limit on the displayed depth. Simply click on Fibonacci<3> and the remaining nodes will be shown.

I have already used this tool to find the most time-consuming templates in ETL an DLL. This is a great tool to indicate where you should focus on improving the template compile-time. I have also been able to find some unnecessary instantiations that could be avoided (either with SFINAE or with refactorings).

templight also contains a fully-fledged debugger for template programs, but I haven't tested it.


In conclusion, I would say that templight and Templar are really helping with template debugging and profiling. There is a real lack of tools in this domain and I hope to see more tools of this kind in the future. I hope this will help you develop template-heavy programs or template metaprograms.