eddic 1.2.4: New Boost Spirit X3 parser and minor cleanups

After almost 2 years, the new version of eddic (the compiler of the EDDI programming language) is out! eddic 1.2.4

I haven't worked a lot on this project in the last years, I have been busy with my Ph.D. related projects (ETL and DLL), my operating systems, cpm, ... I've mostly worked on the parser to test the new version of Boost Spirit: X3. This will be described on the next section, with the other changes in the later section.

New Boost Spirit X3

Boost Spirit X3 is a completely revamped version of Boost Spirit X3. It's aimed at performance, both at compile-time and at runtime and uses recent features of modern C++. It's not compatible with Boost Spirit Qi, so you'll most likely have to rewrite a lot of stuff, in the parser, in the Abstract Syntax Tree (AST) and in the AST passes as well.

For reference, I'm using the Boost 1.59 version.

Pros

Let's start with the pros.

First, the runtime performance is definitely better. Parsing all my eddi test cases and samples, takes 42% less time than with the previous parser. It is important to know that the old parser was very optimized, with moves instead of copies and with a static lexer. You can take a look at this post to see what was necessary to optimize the old Qi. I think it's a good result since the new grammar does not use a lexer (x3 does not support it) and does not need these optimizations. This improvement really was my objective. I'll try to push it farther in the future.

Compile-time performance is also much better. It takes 3 times less time to compile the new parser (1 minute to around 20 seconds). Moreover, the new parser is now in only one file, rather than being it necessary to split it all over the place for compile-time performance. Even though it's not really important for me, it's still good to have :)

Especially due to the performance point, I've been able to remove some code, the lexer, the generated static lexer and the special pointers optimizations of the AST.

Cons

Unfortunately, there are some disadvantages of using the new Spirit X3.

First, the AST needs to be changed. For good parsing performance, you need to use x3::variant and x3::forward_ast. This is a major pain in the ass since x3::variant is much less practical to use than boost::variant. Almost everything is explicit, meaning uglier code than before, in my opinion. Moreover, you need to work around x3::forward_ast for boost::get, whereas boost::recursive_wrapper was working better in that matter. I've had to create my own wrapper around boost::get in order to be able to use the new tree. In my opinion, this is clearly a regression.

Secondly, although X3 was also meant to remove the need to use some hacks in the grammar, I ended up having more hacks than before. For instance, many AST node have a fake field in order to make X3 happy. I've still had to use the horrible eps hack at one place. I've had to create a few more rules in order to fix type deduction that is working differently than before (worse for me). And for some reasons, I had to replace some expectations from the grammar to make it parse correctly. This is a really important regression in my opinion, since it may make the parsing slower and will make the error message less nice.

The previous error handling system allowed me to track the file from which an AST node was parsed from. Although the new error handler is a lot nicer than the old system, it does not have this feature, so I had to work around this by using new annotation nodes and a new global handler. Overall, it's probably a bit worse than before, but makes for lighter AST nodes.

Finally, for some reason, I haven't been able to use the debug option of the library (lots of compile time errors). That complicated a bit the debugging of the parser.

Spirit X3 or Spirit Qi ?

Overall, I have to say I'm a bit disappointed by Spirit X3. Even though it's faster at runtime and faster to compile, I was really expecting less issues with it. What I really did not like was all the changes I had to make because of x3::variant and x3::forward_ast. Overall, I really don't think it was worth the trouble porting my parser to Spirit X3.

If you have a new project, I would still consider using Boost Spirit X3.

If you have an existing parser, I would probably not advice porting it to X3. Unless you really have issues with parsing performances (and especially if you have not already optimized QI parser), it's probably not worth the trouble and all the time necessary for all the changes.

Other changes

The other changes are much more minor. First of all, I've gotten rid of CMake. This project has really made me hate CMake. I have actually gotten rid of it on all my projects. I'm now using plain Makefiles and having a much better time with them. I've also replaced boost Program Options with cxxopts. It's a much more modern approach for program options parsing. Moreover, it's much more lightweight and it's header only. Only advantages. There also have been lots of changes to code (still not very good quality though).

Future

eddic was my first real project in C++ and this can be seen in the code and the organization. The quality of the code is really bad now that I read it again. Some things are actually terrible :P It's probably normal since I was a beginner in C++ at the time.

For the future version of the compiler, I want to clean the code a lot more and focus on the EDDI language adding new features. Moreover, I'll also get rid of Boost Test Framework by using Catch (or doctest if it is ready).

As for now, I'm not sure on which project I'm going to focus. Either I'll continue working on the compiler or I'll start working again on my operating system (thor-os) in which I was working on process concurrency (without too much success :P). I'll probably post next updates on this post in the coming months.

Download

You can find the EDDI Compiler sources on the Github repository

The version is available in the v1.2.4 tag available in the GitHub repository, in the releases pages or directly in the <em>master</em> branch.

Reduce Catch tests compilation time by another 16%

No, it's not the same post as two days! I've been able to reduce the compilation time of my test cases by another 16%!

Two days ago, I posted an article about how I reduced the compilation time of my tests by 13%, by bypassing the expression deduction from Catch. I came up with the macro REQUIRE_EQUALS:

template<typename L, typename R>
void evaluate_result(Catch::ResultBuilder&& __result, L lhs, R rhs){
    __result.setResultType(lhs == rhs);
    __result.setLhs(Catch::toString(lhs));
    __result.setRhs(Catch::toString(rhs));
    __result.setOp("==");
    __result.endExpression();
    __result.react();
}

#define REQUIRE_EQUALS(lhs, rhs) \
    evaluate_result(Catch::ResultBuilder( "REQUIRE", CATCH_INTERNAL_LINEINFO, #lhs " == " #rhs, Catch::ResultDisposition::Normal ), lhs, rhs);

This has the advantage that the left and right hand sides are directly set, not deduced with templates and operator overloading. This still has exactly the same features has the original macro, but it is a bit less nice in the test code. I was quite happy with that optimization, but it turned out, I was not aggressive enough in my optimizations.

Even though it seems simple, the macro is still bloated. There are two constructors calls: ResultBuilder and SourceLineInfo (hidden behind CATCH_INTERNAL_LINEINFO). That means that if you test case has 100 assertions, 200 constructor calls will need to be processed by the compiler. Considering that I have some test files with around 400 assertions, this is a lot of overhead for nothing. Moreover, two parameters have always the same value, no need to repeat them every time.

Simplifying the macro to the minimum led me to this:

template<typename L, typename R>
void evaluate_result(const char* file, std::size_t line, const char* exp, L lhs, R rhs){
    Catch::ResultBuilder result("REQUIRE", {file, line}, exp, Catch::ResultDisposition::Flags::Normal);
    result.setResultType(lhs == rhs);
    result.setLhs(Catch::toString(lhs));
    result.setRhs(Catch::toString(rhs));
    result.setOp("==");
    result.endExpression();
    result.react();
}

#define REQUIRE_EQUALS(lhs, rhs) \
    evaluate_result(__FILE__, __LINE__, #lhs " == " #rhs, lhs, rhs);

The macro is now a simple function call. Even though the function is a template function, it will only be compiled for a few types (double and float in my case), whereas the code of the macro would be unconditionally compiled for each invocation.

With this new macro and function, the compilation time went down from 664 seconds to 554 seconds! This is more than 16% reduction in compilation time. When comparing against the original compilation time (without both optimizations) of 764 seconds, this is a 27% reduction! And there are absolutely no difference in features.

This is a really great result, in my opinion. I don't think this can be cut down more. However, there is still some room for optimization regarding the includes that Catch need. Indeed, it is very bloated as well. A new test framework, doctest follows the same philosophy, but has much smaller include overhead. Once all the necessary features are in doctest, I may consider adapting my macros for it and using it in place of Catch is there is some substantial reduction in compilation time.

If you want to take a look at the code, you can find the adapted code on Github.

Speed up compilation by 13% by simplifying Catch unit tests

In the previous two days, I've working on improving compilation time of my project Expression Templates Library (ETL). I have been able to reduce the compilation time of the complete test suite from 794 seconds to 764 seconds (using only one thread). Trying to get further, I started checking what was taking the most time in a test case when I saw that the REQUIRE calls of the test library were taking a large portion of the compilation time!

I have been using Catch as my test framework for more than two years and it's really been great overall. It is a great tool, header-only, fully-featured, XML reporting for Sonar, ... It really has everything I need from a test framework.

Contrary to some popular test frameworks that provides ASSERT_EQUALS, ASSERT_GREATER and all fashion of assert macros, Catch only provides one version: REQUIRE. For instance:

REQUIRE(x == 1.0);
REQUIRE(y < 5.5);
REQUIRE((z + x) != 22.01f);

The left and right part are detected with some smart template and operator overloading techniques and this makes for very nice test output in case of errors, for instance:

test/src/dyn_matrix.cpp:16: FAILED:
  REQUIRE( test_matrix.rows() == 2UL )
with expansion:
  3 == 2

I think this is pretty nice and the tests are really clear. However, it comes with a cost and I underestimated this at first.

To overcome this, I create two new macros (and few other variations) REQUIRE_EQUALS and REQUIRE_DIRECT that simply bypass Catch deduction of the expression:

inline void evaluate_result_direct(Catch::ResultBuilder&& __result, bool value){
    __result.setResultType(value);
    __result.setLhs(value ? "true" : "false");
    __result.setOp("");
    __result.endExpression();
}

template<typename L, typename R>
void evaluate_result(Catch::ResultBuilder&& __result, L lhs, R rhs){
    __result.setResultType(lhs == rhs);
    __result.setLhs(Catch::toString(lhs));
    __result.setRhs(Catch::toString(rhs));
    __result.setOp("==");
    __result.endExpression();
}

#define REQUIRE_DIRECT(value) \
    evaluate_result_direct(Catch::ResultBuilder( "REQUIRE", CATCH_INTERNAL_LINEINFO, #value, Catch::ResultDisposition::Normal ), value);

#define REQUIRE_EQUALS(lhs, rhs) \
    evaluate_result(Catch::ResultBuilder( "REQUIRE", CATCH_INTERNAL_LINEINFO, #lhs " == " #rhs, Catch::ResultDisposition::Normal ), lhs, rhs);

There is really nothing too special about it, I simply followed the macros and functions in Catch source code until I found out what to bypass.

And now, we use them directly:

REQUIRE_DIRECT(am_i_true());
REQUIRE_EQUALS(x, 1.0);

This is a bit less nice and it requires to know a few more macros, I admit, but it turns out to be much faster (and who really cares about the beauty of test code anyway...). Indeed, the total compilation time of the tests went from 764 seconds to 664 seconds! This is a 13% reduction of the compilation time! I really am impressed of the overhead of this technique. I cannot justify this slowdown just for a bit nicer test code. Finally, the output in case of error remains exactly the same as before.

This proves that sometimes the bottlenecks are not where we expect them :)

If you are interested, you can find the adapted code on Github.

Simplify Deep Learning Library usage on Linux and Windows!

No, I'm not dead ;) I've been very busy with my Ph.D (and playing Path of Exile, let's be honest...) and haven't had time to write something here in a long time.

Until now, there was too way to use my Deep Learning Library (DLL) project:

  1. Write a C++ program that uses the library

  2. Install DLL and write a configuration file to define your network and the problem to solve

The first version gives you all the features of the tool and allows you to build exactly what you need. The second version is a bit more limited, but does not require any C++ knowledge. However, it still does require a recent C++ compiler and build system.

Due to the high C++ requirements that are not met by Visual Studio and the fact that I don't work on Windows, this platform is not supported by the tool. Until now!

I've added a third option to use DLL in the form of a Docker image to make the second option even easier and allow the use of DLL on Windows. All you need is Docker, which is available on Linux, Mac and Windows. This is still limited to the second option in that you need to write a configuration describing the network, but you need to build DLL and don't need to install all its dependencies.

Usage

To install the image, you can simply use docker pull:

docker pull wichtounet/docker-dll

Then, to run it, you have to create a folder containing a dll.conf file and mount in the container at /dll/data/. There are some examples in the image repository. For instance, on Linux from the cloned repository:

docker run -v ${pwd}/rbm_mnist/:/dll/data/ wichtounet/docker-dll

or on Windows:

docker run -v /c/Users/Baptiste/rbm_mnist/:/dll/data wichtounet/docker-dll

This will automatically run the actions specified in the configuration file and train your network.

Conclusion

I would really have thought this would be harder, but it turned out that Docker is a very good solution to deploy multiplatform demo tools :)

As of now, there is only support for mnist data format in the tool in this form, but I plan to add basic CSV support as well in the near future.

I hope that this will help people willing to try the library with a simpler usage.

Use templight and Templar to debug C++ templates

C++ has some very good tools to debug, profile and analyze source files and executables. This all works well for standard runtime program. But, when you are using templates, you sometimes want these tools to act at compile-time. And at this point the support is much more scarce.

templight and Templar and two tools that are trying to fix this issue.

From the templight site:

Templight is a Clang-based tool to profile the time and memory consumption of template instantiations and to perform interactive debugging sessions to gain introspection into the template instantiation process.

and Templar is a visualization tool for the traces generated by templight.

Installation

Unfortunately, the templight installation is not user-friendly at all. You need to clone the complete LLVM/Clang tree and add templight inside it before compiling the complete clang toolchain. But that is the case for all clang-based tools... You also need to patch clang but that may not be necessary in the future. The complete instructions are available here.

The installation of Templar is much more convenient:

git clone https://github.com/schulmar/Templar.git
cd Templar
git checkout feature/templight2
qmake .
make
sudo make install

The branch feature/templight2 has much more features than the master and should support both Qt4 and Qt5, but I have only tested it on Qt4.

Example

Let's use the class Fibonacci function as an example:

#include <iostream>

template <std::size_t N>
struct Fibonacci {
    static constexpr const std::size_t value = Fibonacci<N-1>::value + Fibonacci<N-2>::value;
};

template <>
struct Fibonacci<1> {
    static constexpr const std::size_t value = 1;
};

template <>
struct Fibonacci<0> {
    static constexpr const std::size_t value = 0;
};

int main(){
    std::cout << "Fibonacci<5>:" << Fibonacci<5>::value << std::endl;
}

Nothing fancy here, we're simply printing the fifth Fibonacci number on the console.

You can compile it with templight++:

templight++ -Xtemplight -profiler -Xtemplight -memory -Xtemplight -ignore-system -std=c++14 main.cpp

All the templight options starts with -Xtemplight and then you can use any clang++ options. This will generate a a.memory.trace.pbf file in the current directory. You can then run Templar. use File > Open Trace to open the trace file. This should open a window of this sort:

/images/templar.png

The top-left panel contains the source code of the application, automatically refreshed whenever you move in the template tree. In the top right, there is the template instantiation graph. In the bottom left, you'll see a list of list of files to be able to filter them and in the bottom right, you'll see the list of templates events. You can sort the list of template events by duration which is really convenient. You can then select Fibonacci<5> by double clicking it in the list (once sorted, it should be near the top). This should give you a tree looking something like that:

/images/templar_tree.png

The edgy nodes are template instantiations and the round nodes are template memoization. We can directly see that each instantiation was only done once. I think this graph view is really helpful if you need to debug computation done at compile time. You can see that that not all nodes are displayed, this is because there is a limit on the displayed depth. Simply click on Fibonacci<3> and the remaining nodes will be shown.

I have already used this tool to find the most time-consuming templates in ETL an DLL. This is a great tool to indicate where you should focus on improving the template compile-time. I have also been able to find some unnecessary instantiations that could be avoided (either with SFINAE or with refactorings).

templight also contains a fully-fledged debugger for template programs, but I haven't tested it.

Conclusion

In conclusion, I would say that templight and Templar are really helping with template debugging and profiling. There is a real lack of tools in this domain and I hope to see more tools of this kind in the future. I hope this will help you develop template-heavy programs or template metaprograms.

Improve DLL and ETL Compile Time further

For a while, the compilation time of my matrix/vector computation library (ETL), based on Expression Templates has become more and more problematic. I've already worked on this problem here and there, using some general techniques (pragmas, precompiled headers, header removals and so on). On this post, I'll talk about two major improvements I have been able to do directly in the code.

Use of static_if

Remember static_if ? I was able to use it to really reduce the compile time of DLL.

I wrote a script to time each test case of the DLL project to find the test cases that took the longest to compile. Once I found the best candidate, I isolated the functions that took the longest to compile. It was quite tedious and I did it by hand, primarily by commenting parts of the code and going deeper and deeper in the code. I was quite suprised to find that a single function call (template function of course ;) ) was responsible for 60% of the compilation time of my candidate test case. The function was instantiating a whole bunch of expression templates (to compute the free energy of several models). The function itself was not really optimizable, but what was really interesting is that this function was only used in some very rare cases and that these cases were known at compile-time :) This was a perfect case to use a static_if. And once the call was inside the static_if, the test case was indeed about 60% faster. This reduced the overall compilation time of DLL by about 30%!

This could also of course also have been achieved by using two functions, one with the call, one empty and selected by SFINAE (Substitution Failure Is Not An Error). I prefer the statif_if version since this really shows the intent and hides SFINAE behind nicer syntax.

I was also able to use static_if at other places in the DLL code to avoid instantiating some templates, but the improvements were much less dramatic (about 1% of the total compilation time). I was very lucky to find a single function that accounted for so much compile time. After some more tests, I concluded that much of the compilation time of DLL was spent compiling the Expression Templates from my ETL library so I decided to delve into ETL code directly.

Removal of std::async

The second improvement was very surprising. I was working on improving the compilation of ETL and found out that the sum and average reductions of matrices were dramatically slow, about an order of magnitude slower than standard operations on matrices. In parallel (but the two facts are linked), I also found out another weird fact when splitting a file into 10 parts (the file was comprised of 10 test cases). Compiling the 10 parts separarely (and sequentially, not multiple threads) was about 40% faster than compiling the complete file. There was no swapping so it was not a memory issue. This is not expected. Generally, it is faster to compile a big file than to compile its parts separately. The advantage of smaller files is that you can compile them in parallel and that incremental builds are faster (only compile a small part).

By elimination, I found out that most of the time was spent inside the function that was dispatching in parallel the work for accumulating the sum of a matrix. Here is the function:

template <typename T, typename Functor, typename AccFunctor>
inline void dispatch_1d_acc(bool p, Functor&& functor, AccFunctor&& acc_functor, std::size_t first, std::size_t last){
    if(p){
        std::vector<std::future<T>> futures(threads - 1);

        auto n = last - first;
        auto batch = n / threads;

        for(std::size_t t = 0; t < threads - 1; ++t){
            futures[t] = std::async(std::launch::async, functor, first + t * batch, first + (t+1) * batch);
        }

        acc_functor(functor(first + (threads - 1) * batch, last));

        for(auto& fut : futures){
            acc_functor(fut.get());
        }
    } else {
        acc_functor(functor(first, last));
    }
}

There isn't anything really fancy about this function. This takes one functor that will be done in parallel and one function for accumulation. It dispatches all the work in batch and then accumulates the results. I tried several things to optimize the compilation time of this function, but nothing worked. The line that was consuming all the time was the std::async line. This function was using std::async because the thread pool that I'm generally using does not support returning values from parallel functors. I decided to use a workaround and use my thread pool and I came out with this version:

template <typename T, typename Functor, typename AccFunctor>
inline void dispatch_1d_acc(bool p, Functor&& functor, AccFunctor&& acc_functor, std::size_t first, std::size_t last){
    if(p){
        std::vector<T> futures(threads - 1);
        cpp::default_thread_pool<> pool(threads - 1);

        auto n = last - first;
        auto batch = n / threads;

        auto sub_functor = [&futures, &functor](std::size_t t, std::size_t first, std::size_t last){
            futures[t] = functor(first, last);
        };

        for(std::size_t t = 0; t < threads - 1; ++t){
            pool.do_task(sub_functor, t, first + t * batch, first + (t+1) * batch);
        }

        acc_functor(functor(first + (threads - 1) * batch, last));

        pool.wait();

        for(auto fut : futures){
            acc_functor(fut);
        }
    } else {
        acc_functor(functor(first, last));
    }
}

I simply preallocate space for all the threads and create a new functor calling the input functor and saving its result inside the vector. It is less nice, but it works well. And it compiles MUCH faster. This reduced the compilation time of my biggest test case by a factor of 8 (from 344 seconds to 44 seconds). This is really crazy. It also fixed the problem where splitting the test case was faster than big file (it is now twice faster to compile the big files than compiling all the small files separately). This reduced the total compilation time of dll by about 400%.

As of now, I still have no idea why this makes such a big difference. I have looked at the std::async code, but I haven't found a valid reason for this slowdown. If someone has any idea, I'd be very glad to discuss in the comments below.

Improving the template instantiation tree

I recently discovered the templight tool that is a profiler for templates (pretty cool). After some time, I was able to build it and use it on ETL. For now, I haven't been able to reduce compile time a lot, but I have been able to reduce the template instantiation tree a lot seeing that some instantiations were completely useless and I optimized the code to remove them.

I won't be go into much details here because I plan to write a post on this subject in the coming days.

Conclusion

In conclusion, I would say that it is pretty hard to improve the compile time of complex C++ programs once you have gone through all the standard methods. However, I was very happy to found that two optimizations in the source code reduced the overall compilation of DLL by almost 500%. I will continue working on this, but for now, the compilation time is much more reasonable.

I hope the two main facts in this article were interesting. If you have similar experience, comments or ideas for further improvements, I'd be glad to discuss them with you in the comments :)

Detect overflows and more in Java with COJAC

Back at school, I worked on the COJAC project to detect numeric overflow in Java programs automatically. Since then, this project has evolved a lot and has now more features:

  • It can detect integer overflows

  • Detect smearing and cancellation with float and double types

  • Detect NaN and Infinite results from computations

  • Detect offending type casting

Moreover, all these features are available without any recompilation of your program. You simply add an argument to the invocation of the Java virtual machine and all these errors will be detected for you automatically!

Frédéric Bapst, the person in charge of the project has recently published two videos about the project, don't hesitate to check them out:

The first video presents the automatic analysis features of the tool:

And the second presents the numeric wrapper features of the tool for even more features:

If you have any question related to the project, you can add a comment to this page or contact me directly be email.

If you want more information on the project you can also check out its repository on Github: https://github.com/Cojac/Cojac

Aggregator Plugin : Display global metrics in Sonarqube

Recently, I wanted to know how many lines of code I had on my Sonar server with all my C++ projects. Sonarsource proposes a commercial plugins (Views) that allows to do that (and much more...), but I didn't wanted to pay thousands of dollars simply to get a total of my lines of code, therefore I wrote a very simple Sonar plugin to compute some global metrics.

This plugin is very simple, it only provides a global widgets that aggregates some stats over all your projects. For instance, here is the results on my Sonar server:

/images/aggregator_widget.png

The plugin is freely available on Github: https://github.com/wichtounet/aggregator-plugin . However, it has only be tested on my Sonar server (4.5.2) and it is my first Sonar plugin, so it may not work everywhere. If you experience issues, don't hesitate to open an issue on Github or to propose a Pull Request.

You can install the plugin by putting the .jar file (from the Github Releases page) into your sonar/extensions/plugins directory and restart Sonar. You should then have access to a new global widget that you can add to a dashboard.

I hope this plugin helps some of you.

Upgrade to Nikola 7

I've finally taken the time to upgrade the website to Nikola 7 (it is about time, I know...).

The migration worked flawlessly, I simply had to update configuration to migrate deprecated and renamed tags and it worked really well. I also had to add a comma to the COMPILERS list because of the use of Python 3.3 now.

As you may have seen, I haven't posted in a while. I had quite some work for my thesis as well as for the courses I give at my school and I started playing Path Of Exile with took quite a bit of my free time :) I'll try to give some updates on the project I'm working on to make this blog live again.

Simulate static_if with C++11/C++14

If you are doing a lot of template metaprogramming and other template magic stuff, you are likely to miss a static_if in the language. Unfortunately, it didn't make the cut for C++11 and it seems unlikely that it will make it in C++17.

static_if

As its name indicates, static_if is an if statement but that is done at compile-time. At first, it could seem that the main point is performance, but that is not the case. With recent compilers, if you have an if statement with a compile-time constant, it will never be executed at runtime and only the correct branch will be included in the final executable code. However, even if the compiler knows that a branch will never be executed, it still has to ensure that this branch compiles. This is not the case with static_if. With static_if, only the valid branch is compiled, the other can contains invalid code. The most common reason to use a static_if is inside a template where you perform a test on a template argument and execute code based on this test. static_if has another advantage on standard if. Since only one branch is instantiated, it may save quite a lot of compile-time.

Let's say we have to write a template function that, if the template argument is a string, removes the last character of the string argument, otherwise decrement the argument (I know, stupid example, but simple). With static_if, you can write it like this:

template<typename T>
void decrement_kindof(T& value){
    static_if(std::is_same<std::string, T>::value){
        value.pop_back();
    } else {
        --value;
    }
}

I think it is quite elegant.

The problem

Some may think, that we could do the same with C++ standard if statement:

template<typename T>
void decrement_kindof(T& value){
    if(std::is_same<std::string, T>::value){
        value.pop_back();
    } else {
        --value;
    }
}

However, this won't work. This template cannot be instantiated for std::string since it doesn't have an operator -- and it cannot be instantiated for int since it doesn't have a pop_back() function.

There are two solutions in plain C++: specialization and SFINAE. Let's start with specialization:

template<typename T>
void decrement_kindof(T& value){
    --value;
}

template<>
void decrement_kindof(std::string& value){
    value.pop_back();
}

We do a specialization for std::string case so that in the general case it uses -- and in the std::string case, it uses pop_back(). And the SFINAE version:

template<typename T, std::enable_if_t<!std::is_same<std::string, T>::value, int> = 42>
void decrement_kindof(T& value){
    --value;
}

template<typename T, std::enable_if_t<std::is_same<std::string, T>::value, int> = 42>
void decrement_kindof(T& value){
    value.pop_back();
}

The first function is enabled when the type is not a std::string and the second function is enabled when the type is a std::string.

Both solutions needs two functions to make it work. In this particular case, specialization is easier since the condition states exactly one type. If the condition was more complex for instance testing that a constant inside the type is equals to some value, we could only do it with SFINAE.

Even if both solutions work, both solutions are more complicated than the static_if version and both solutions are creating more functions than what should be necessary.

One solution

There is one way to emulate a kind of static_if with C++14 generic lambdas. It is kind of using anonymous template function to emulate what we did with the previous solutions but does it behind the scene. Here the code I'm using for this emulation:

namespace static_if_detail {

struct identity {
    template<typename T>
    T operator()(T&& x) const {
        return std::forward<T>(x);
    }
};

template<bool Cond>
struct statement {
    template<typename F>
    void then(const F& f){
        f(identity());
    }

    template<typename F>
    void else_(const F&){}
};

template<>
struct statement<false> {
    template<typename F>
    void then(const F&){}

    template<typename F>
    void else_(const F& f){
        f(identity());
    }
};

} //end of namespace static_if_detail

template<bool Cond, typename F>
static_if_detail::statement<Cond> static_if(F const& f){
    static_if_detail::statement<Cond> if_;
    if_.then(f);
    return if_;
}

Note: I got the idea (and most of the code) from the Boost Mailing List.

The condition is passed a non-type template parameter and the code for the branch is a passed a generic lambda functor. The static_if function returns a statement structure. We could avoid returning a struct and directly execute, or not, the functor based on the condition, but using a structure allows for the else_ part which may be practical. The structure statement is specialized on the condition. If the condition is true, the right part will execute the functor while the false part will not execute anything. The specialization when the condition is false willl do the contrary. A special point here is the use of the identity function. The function is passed to the lambda. The user can then use this function to make non-dependent type dependent. This is necessary if we want to call functions on non-dependent types and these functions may not exist. For instance, you may want to call a function on this, which is not a dependent type.

Here is how the code will look using this solution:

template<typename T>
void decrement_kindof(T& value){
    static_if<std::is_same<std::string, T>::value>([&](auto f){
        f(value).pop_back();
    }).else_([&](auto f){
        --f(value);
    });
}

It is not as elegant as the "real" static_if version, but it is closer than the other solutions.

If you don't use the lazy identity function (f), it still works on g++, but not on clang for some reasons.

Conclusion

We saw that there are some solutions to emulate static_if in C++ that you may use to make the code easier to read. I'm personally using this trick on branches with few lines of code and when I don't have to use the identity function too much, otherwise it is cleaner to use standard SFINAE functions to do the job. When you only have a if and no else, this trick is even better because that is where it saves the more code.

I hope this can be useful to some of you ;)

You can find my implementation on Github.