PVS-Studio on C++ Library Review

Baptiste Wicht

2016-12-20 09:40

PVS-Studio is a commercial static analyzer for C, C++ and C#. It works in both Windows and Linux.

It has been a long time since I wanted to test it on my projects. I contacted The PVS-Studio team and they gave me a temporary license so that I can test the tool and make a review.

I tried the static analyzer on my Expression Templates Library (ETL) project. This is a heavily-templated C++ library. I tried it on Linux of course.

Usage

The installation is very simple, simply untar an archive and put the executables in your PATH (or use absolute paths). There are also some deb and rpm packages for some distributions. You need strace to make the analyzer work, it should be available on any Linux platform.

The usage of PVS-Studio on Linux should be straightforward. First, you can use the analyzer directly with make and it will detect the invocations of the compiler. For instance, here is the command I used for ETL:

pvs-studio-analyzer trace -- make -j5 debug/bin/etl_test

Note that you can use several threads without issues, which is really great. There does not seem to be any slowdown at this stage, probably only collecting compiler arguments.

This first step creates a strace_out file that will be used by the next stage.

Once, the compilation has been analyzed, you can generate the results with the analyze function, for which you'll need a license. Here is what I did:

pvs-studio-analyzer analyze -l ~/pvs_studio/PVS-Studio.lic -j5

Unfortunately, this didn't work for me:

No compilation units found
Analysis finished in 0:00:00.00

Apparently, it's not able to use the strace_out it generated itself...

Another possibility is to use the compilation database from clang to use PVS-Studio. So I generated my compile_commands.json file again (it was not up to date...) with Bear. And then, you only need to run the analyze step:

pvs-studio-analyzer analyze -l ~/pvs_studio/PVS-Studio.lic -j5

Make sure you have the same compiler configured than the one used to generate the compilation database to avoid errors with compiler arguments.

Unfortunately, this just printed a load of crap on my console:

(L8Pu(]-'Lo8h>uo(_uv(uo2(->'2h_u(uo2(uvU2K h>'o8a=}Lkk;x[G^%cuaa8acr[VS%
$ckUaoc8 c'8>_-o-8>U2cu/kau==-8>c-=cU2]Uf=c U2=u%c&kU__->j}c@uvu2%cJ
(L8Pu(]-'Lo8h>uo(_uv(uo2(->'2h_u(uo2(uvU2K h>'o8a=}Lkk;JVJ^%cuaa8acr[VS%
$ckUaoc8 c'8>_-o-8>U2cu/kau==-8>c-=cU2]Uf=c U2=u%c&kU__->j}c@uvu2%cJ
(L8Pu(]-'Lo8h>uo(_uv(uo2(->'2h_u(uo2(uvU2K h>'o8a=}Lkk;*[G^%cuaa8acr[VS%
$ckUaoc8 c'8>_-o-8>U2cu/kau==-8>c-=cU2]Uf=c U2=u%c&kU__->j}c@uvu2%cJ
(L8Pu(]-'Lo8h>uo(_uv(uo2(->'2h_u(uo2(uvU2K h>'o8a=}Lkk;b[b^%cuaa8acr[VS%
$ckUaoc8 c'8>_-o-8>U2cu/kau==-8>c-=cU2]Uf=c U2=u%c&kU__->j}c@uvu2%cJ
(L8Pu(]-'Lo8h>uo(_uv(uo2(->'2h_u(uo2(uvU2K h>'o8a=}Lkk;[[x^%cuaa8acr[VS%
$ckUaoc8 c'8>_-o-8>U2cu/kau==-8>c-=cU2]Uf=c U2=u%c&kU__->j}c@uvu2%cJ

Pretty nice, isn't it ?

Let's try again in a file:

pvs-studio-analyzer analyze -o results.log -l ~/pvs_studio/PVS-Studio.lic -j5

The time is quite reasonable for the analysis, it took much less time than the compilation time. In total, it took 88 seconds to analyze all the files. It's much faster than the clang static analyzer.

This time it worked, but the log file is not readable, you need to convert it again:

plog-converter -t errorfile -o errors results.log

And finally, you can read the results of the analysis in the errors file.

Results

Overall, PVS-Studio found 236 messages in the ETL library, I was expecting more. I also wish there was an HTML report that include the source code as well as the error message. I had to lookup at the code for each message (you could integrate it in vim and then use the quickfix window to do that). There are some visualization but in things like QtCreator or LibreOffice which I don't have nor want on my computer.

Let's look at the results. For each message, I'll include the message from PVS-Studio and the code if it's relevant.

The first is about using the comma:

include/etl/traits.hpp:674:1: error: V521 Such expressions using the ',' operator are dangerous. Make sure the expression is correct.
include/etl/traits.hpp:674:1: error: V685 Consider inspecting the return statement. The expression contains a comma.

template <typename E>
constexpr std::size_t dimensions(const E& expr) noexcept {
    return (void)expr, etl_traits<E>::dimensions();
}

Here I'm simply using the comma operand to ignore expr to avoid a warning. To make this compile in C++11, you need to do it in one line otherwise it's not a constexpr function. It's probably not perfect to use this construct, but there is no problem here.

There is a bunch of these, let's filter them, it remains 207 warnings. Let's jump to the next one:

include/etl/impl/blas/fft.hpp:29:1: error: V501 There are identical sub-expressions to the left and to the right of the '==' operator: (DFTI_SINGLE) == DFTI_SINGLE

inline void fft_kernel(const std::complex<float>* in, std::size_t s, std::complex<float>* out) {
    DFTI_DESCRIPTOR_HANDLE descriptor;

    void* in_ptr = const_cast<void*>(static_cast<const void*>(in));

    DftiCreateDescriptor(&descriptor, DFTI_SINGLE, DFTI_COMPLEX, 1, s); //Specify size and precision
    DftiSetValue(descriptor, DFTI_PLACEMENT, DFTI_NOT_INPLACE);         //Out of place FFT
    DftiCommitDescriptor(descriptor);                                   //Finalize the descriptor
    DftiComputeForward(descriptor, in_ptr, out);                        //Compute the Forward FFT
    DftiFreeDescriptor(&descriptor);                                    //Free the descriptor
}

Unfortunately, the error is inside the MKL library. Here, I really don't think it's an issue. There is pack of them. I forgot to exclude non-ETL code from the results. Once filter from all dependencies, 137 messages remain.

include/etl/eval_functors.hpp:157:1: warning: V560 A part of conditional expression is always false: !padding.

This is true, but not an issue since padding is a configuration constant that enables the use of padding in vector and matrices. There was 27 of these at different locations and with different configuration variables.

include/etl/op/sub_view.hpp:161:1: note: V688 The 'i' function argument possesses the same name as one of the class members, which can result in a confusion.

This is again true, but not a bug in this particular case. It is still helpful and I ended up changing these to avoid confusion. Again, there was a few of these.

etl/test/src/conv_multi_multi.cpp:23:1: error: V573 Uninitialized variable 'k' was used. The variable was used to initialize itself.

This one is in the test code:

for (size_t k = 0; k < etl::dim<0>(K); ++k) {
    for (size_t i = 0; i < etl::dim<0>(I); ++i) {
        C_ref(k)(i) = conv_2d_valid(I(i), K(k)); // HERE
    }
}

I don't see any error, k is initialized correctly to zero in the first loop. This is a false positive for me. There were several of these in different places. It seems to that the use of the operator() is confusing for PVS-Studio.

include/etl/traits.hpp:703:1: note: V659 Declarations of functions with 'rows' name differ in the 'const' keyword only, but the bodies of these functions have different composition. This is suspicious and can possibly be an error. Check lines: 693, 703.

template <typename E, cpp_disable_if(decay_traits<E>::is_fast)>
std::size_t rows(const E& expr) { //693
    return etl_traits<E>::dim(expr, 0);
}

template <typename E, cpp_enable_if(decay_traits<E>::is_fast)>
constexpr std::size_t rows(const E& expr) noexcept { //703
    return (void)expr, etl_traits<E>::template dim<0>();
}

Unfortunately, this is again a false positive because PVS-Studio failed to recognized SFINAE and therefore the warning is wrong.

include/etl/builder/expression_builder.hpp:345:1: note: V524 It is odd that the body of '>>=' function is fully equivalent to the body of '*=' function.

This one is interesting indeed. It is true that they are exactly because in ETL >> is used for scalar element-wise multiplication. This is quite interesting that PVS-Studio points that out. There was a few of these oddities but all were normal in the library.

etl/test/src/compare.cpp:23:1: error: V501 There are identical sub-expressions to the left and to the right of the '!=' operator: a != a

Again, it is nice that PVS-Studio finds that, but this is done on purpose on the tests to compare an object to itself. If I remove all the oddities in the test cases, there are only 17 left in the headers. None of the warnings on the test case was serious, but there was no more false positives either, so that's great.

include/etl/impl/vec/sum.hpp:92:1: error: V591 Non-void function should return a value.

template <typename L, cpp_disable_if((vec_enabled && all_vectorizable<vector_mode, L>::value))>
value_t<L> sum(const L& lhs, size_t first, size_t last) {
    cpp_unused(lhs);
    cpp_unused(first);
    cpp_unused(last);
    cpp_unreachable("vec::sum called with invalid parameters");
}

This one is interesting. It's not a false positive since indeed the function does not return a value, but there is a __builtin_unreachable() inside the function and it cannot be called. In my opinion, the static analyzer should be able to handle that, but this is really a corner case.

include/etl/sparse.hpp:148:1: note: V550 An odd precise comparison: a == 0.0. It's probably better to use a comparison with defined precision: fabs(A - B) < Epsilon.

inline bool is_zero(double a) {
    return a == 0.0;
}

This is not false, but again this is intended because of the comparison to zero for a sparse matrix. There were 10 of these in the same class.

include/etl/impl/blas/fft.hpp:562:1: note: V656 Variables 'a_padded', 'b_padded' are initialized through the call to the same function. It's probably an error or un-optimized code. Consider inspecting the 'etl::size(c)' expression. Check lines: 561, 562.

dyn_vector<etl::complex<type>> a_padded(etl::size(c));
dyn_vector<etl::complex<type>> b_padded(etl::size(c));

It's indeed constructed with the same size, but for me I don't think it's an odd pattern. I would not consider that as a warning, especially since it's a constructor and not a assignment.

include/etl/dyn_base.hpp:312:1: warning: V690 The 'dense_dyn_base' class implements a copy constructor, but lacks the '=' operator. It is dangerous to use such a class.

This is again a kind of corner case in the library because it's a base class and the assignment is different between the sub classes and not a real assignment in the C++ sense.

include/etl/impl/reduc/conv_multi.hpp:657:1: warning: V711 It is dangerous to create a local variable within a loop with a same name as a variable controlling this loop.

for (std::size_t c = 0; c < C; ++c) {
    for (std::size_t k = 0; k < K; ++k) {
        conv(k)(c) = conv_temp(c)(k);
    }
}

This is again a false positive... It really seems that PVS-Studio is not able to handle the operator().

include/etl/impl/pooling.hpp:396:1: error: V501 There are identical sub-expressions to the left and to the right of the '||' operator: P1 || P2 || P1

template <size_t C1, size_t C2, size_t C3,size_t S1, size_t S2, size_t S3, size_t P1, size_t P2, size_t P3, typename A, typename M>
static void apply(const A& sub, M&& m) {
    const size_t o1 = (etl::dim<0>(sub) - C1 + 2 * P1) / S1 + 1;
    const size_t o2 = (etl::dim<1>(sub) - C2 + 2 * P2) / S2 + 1;
    const size_t o3 = (etl::dim<2>(sub) - C3 + 2 * P3) / S3 + 1;

    if(P1 || P2 || P1){

Last but not least, this time, it's entirely true and it's in fact a bug in my code! The condition should be written like this:

if(P1 || P2 || P3){

This is now fixed in the master of ETL.

Conclusion

The installation was pretty easy, but the usage was not as easy as it could because the first method by analyzing the build system did not work. Fortunately, the system supports using the Clang compilation database directly and therefore it was possible to use.

Overall, it found 236 warnings on my code base (heavily templated library). Around 50 of them were in some of the extend libraries, but I forgot to filter them out. The quality of the results is pretty good in my opinion. It was able to find a bug in my implementation of pooling with padding. Unfortunately, there was quite a few false positives, due to SFINAE, bad handling of the operator() and no handling of __builtin_unreachable. The remaining were all correct, but were not bug considering their usages.

To conclude, I think it's a great static analyzer that is really fast compared to other one in the market. There are a few false positives, but it's really not bad compared to other tools and some of the messages are really great. An HTML report including the source code would be great as well.

If you want more information, you can consult the official site. There is even a way to use it on open-source code for free, but you have to add comments on top of each of your files.

I hope it was helpful ;)

C++ Compiler benchmark on Expression Templates Library (ETL)

Baptiste Wicht

2016-12-11 14:17

Comments

In my Expression Templates Library (ETL) project, I have a lot of template heavy code that needs to run as fast as possible and that is quite intensive to compile. In this post, I'm going to compare the performance of a few of the kernels produced by different compilers. I've got GCC 5.4, GCC 6.20 and clang 3.9. I also included zapcc which is based on clang 4.0.

These tests have been run on an Haswell processor. The automatic parallelization of ETL has been turned off for these tests.

Keep in mind that some of the diagrams are presented in logarithmic form.

Vector multiplication

The first kernel is a very simple one, simple element-wise multiplication of two vectors. Nothing fancy here.

For small vectors, clang is significantly slower than gcc-5.4 and gcc6.2. On vectors from 100'000 elements, the speed is comparable for each compiler, depending on the memory bandwidth. Overall, gcc-6.2 produces the fastest code here. clang-4.0 is slightly slower than clang-3.9, but nothing dramatic.

Vector exponentiation

The second kernel is computing the exponentials of each elements of a vector and storing them in another vector.

Interestingly, this time, clang versions are significantly faster for medium to large vectors, from 1000 elements and higher, by about 5%. There is no significant differences between the different versions of each compiler.

Matrix-Matrix Multiplication

The next kernel I did benchmark with the matrix-matrix multiplication operation. In that case, the kernel is hand-unrolled and vectorized.

There are few differences between the compilers. The first thing is that for some sizes such as 80x80 and 100x100, clang is significantly faster than GCC, by more than 10%. The other interesting fact is that for large matrices zapcc-clang-4.0 is always slower than clang-3.9 which is itself on par with the two GCC versions. In my opinion, it comes from a regression in clang trunk but it could also come from zapcc itself.

The results are much more interesting here! First, there is a huge regression in clang-4.0 (or in zapcc for that matter). Indeed, it is up to 6 times slower than clang-3.9. Moreover, the clang-3.9 is always significantly faster than gcc-6.2. Finally, there is a small improvement in gcc-6.2 compared to gcc 5.4.

Fast-Fourrier Transform

The following kernel is the performance of a hand-crafted Fast-Fourrier transform implementation.

On this benchmark, gcc-6.2 is the clear winner. It is significantly faster than clang-3.9 and clang-4.0. Moreover, gcc-6.2 is also faster than gcc-5.4. On the contrary, clang-4.0 is significantly slower than clang-3.9 except on one configuration (10000 elements).

1D Convolution

This kernel is about computing the 1D valid convolution of two vectors.

While clang-4.0 is faster than clang-3.9, it is still slightly slower than both gcc versions. On the GCC side, there is not a lot of difference except on the 1000x500 on which gcc-6.2 is 25% faster.

And here are the results with the naive implementation:

Again, on the naive version, clang is much faster than GCC on the naive, by about 65%. This is a really large speedup.

2D Convolution

This next kernel is computing the 2D valid convolution of two matrices

There is no clear difference between the compilers in this code. Every compiler here has up and down.

Let's look at the naive implementation of the 2D convolution (units are milliseconds here not microseconds):

This time the difference is very large! Indeed, clang versions are about 60% faster than the GCC versions! This is really impressive. Even though this does not comes close to the optimized. It seems the vectorizer of clang is much more efficient than the one from GCC.

4D Convolution

The final kernel that I'm testing is the batched 4D convolutions that is used a lot in Deep Learning. This is not really a 4D convolution, but a large number of 2D convolutions applied on 4D tensors.

Again, there are very small differences between each version. The best versions are the most recent versions of the compiler gcc-6.2 and clang-4.0 on a tie.

Conclusion

Overall, we can see two trends in these results. First, when working with highly-optimized code, the choice of compiler will not make a huge difference. On these kind of kernels, gcc-6.2 tend to perform faster than the other compilers, but only by a very slight margin, except in some cases. On the other hand, when working with naive implementations, clang versions really did perform much better than GCC. The clang compiled versions of the 1D and 2D convolutions are more than 60% faster than their GCC counter parts. This is really impressive. Overall, clang-4.0 seems to have several performance regressions, but since it's not still a work in progress, I would not be suprised if these regressions are not present in the final version. Since the clang-4.0 version is in fact the clang version used by zapcc, it's also possible that zapcc is introducing new performance regressions.

Overall, my advice would be to use GCC-6.2 (or 5.4) on hand-optimized kernels and clang when you have mostly naive implementations. However, keep in mind that at least for the example shown here, the naive version optimized by the compiler never comes close to the highly-optimized version.

As ever, takes this with a grain of salt, it's only been tested on one project and one machine, you may obtain very different results on other projects and on other processors.

zapcc C++ compilation speed against gcc 5.4 and clang 3.9

Baptiste Wicht

2016-12-05 18:46

Comments

A week ago, I compared the compilation time performance of zapcc against gcc-4.9.3 and clang-3.7. On debug builds, zapcc was about 2 times faster than gcc and 3 times faster than clang. In this post, I'm going to try some more recent compilers, namely gcc 5.4 and clang 3.9 on the same project. If you want more information on zapcc, read the previous posts, this post will concentrate on results.

Again, I use my Expression Template Library (ETL). This is a purely header-only library with lots of templates. I'm going to compile the full test cases.

The results of the two articles are not directly comparable, since they were obtained on two different computers. The one on which the present results are done has a less powerful and only 16Go of RAM compared to the 32Go of RAM of my build machine. Also take into account that that the present results were obtained on a Desktop machine, there can be some perturbations from background tasks.

Just like on the previous results, it does not help using more threads than physical cores, therefore, the results were only computed on up to 4 cores on this machine.

The link time is not taken into account on the results.

Debug build

Let's start with the result of the debug build.

Compiler	-j1	-j2	-j4
g++-5.4.0	469s	230s	130s
clang++-3.9	710s	371s	218s
zapcc++	214s	112s	66s
Speedup VS Clang	3.31	3.31	3.3
Speedup VS GCC	2.19	2.05	1.96

The results are almost the same as the previous test. zapcc is 3.3 times faster to compile than Clang and around 2 times faster than GCC. It seems that GCC 5.4 is a bit faster than GCC 4.9.3 while clang 3.9 is a bit slower than clang 3.7, but nothing terribly significant.

Overall, for debug builds, zapcc can bring a very significant improvement to your compile times.

Release build

Let's see what is the status of Release builds. Since the results are comparable between the numbers of threads, the results here are just for one thread.

This is more time consuming since a lot of optimizations are enabled and more features from ETL are enabled as well.

Compiler	-j1
g++-5.4.0	782s
clang++-3.9	960s
zapcc++	640s
Speedup VS Clang	1.5
Speedup VS GCC	1.22

On a release build, the speedups are much less interesting. Nevertheless, they are still significant. zapcc is still 1.2 times faster than gcc and 1.5 times faster than clang. Then speedup against clang 3.9 is significantly higher than it was on my experiment with clang 3.7, it's possible that clang 3.9 is slower or simply has new optimization passes.

Conclusion

The previous conclusion still holds with modern version of compilers: zapcc is much faster than other compilers on Debug builds of template heavy code. More than 3 times faster than clang-3.9 and about 2 times faster than gcc-5.4. Since it's based on clang, there should not be any issue compiling projects that already compile with a recent clang. Even though the speedups are less interesting on a release build, it is still significantly, especially compared against clang.

I'm really interested in finding out what will be the pricing for zapcc once out of the beta or if they will be able to get even faster!

For the comparison with gcc 4.9.3 and clang 3.7, you can have a look at this article.

If you want more information about zapcc, you can go to the official website of zapcc

New design: Faster and mobile compatible

Baptiste Wicht

2016-11-28 07:55

Comments

I've finally taken the time to improve the design of the website!

The site was becoming slower and slower, the design was not responsive at all and was an horror on mobile.

I've changed the design to focus more on content and removed superfluous things such as the Google profile or slow things such as the 3D tag cloud. Moreover, the design is now responsive again. It was a matter of removing a lot of bad things I did in the CSS. Instead of having a vertical and an horizontal bars, I now have only one vertical bar with both the navigation and a bit more information. With these changes, the design is now also working on mobile phone! It's about time.

Moreover, I've also spent quite some time working on the speed of the website. For this, I've bundled most of the JS and CSS files together and reduced them. Moreover, the static files are now hosted and cached by CloudFlare. I've also removed the 3D tag cloud which was quite slow. The Google API usage for the Google profile badge were also quite slow. Overall, the index page is now really fast. The article pages are also much faster but it's not perfect, especially because of Disqus that does tons of requests and redirects everywhere. I've also got rid of the Disqus ads which were really insignificant in the end. It may take a while for the ads to disappear according to Disqus.

I know that it's still not perfect, but I hope that user experience on the blog is now improved for all readers and now article can be read on mobile normally. I'll try to continue monitoring the speed and usability of the website to see if I can improve it further in the coming days.

If you have any issue on the updated website, don't hesitate to let me know either by commenting on this post or sending me an email (check the Contact page).

zapcc - a faster C++ compiler

Baptiste Wicht

2016-11-26 13:17

Comments

Update: For a comparison against more modern compiler versions, you can read: zapcc C++ compilation speed against gcc 5.4 and clang 3.9

I just joined the private beta program of zapcc. Zapcc is a c++ compiler, based on Clang which aims at being much faster than other C++ compilers. How they are doing this is using a caching server that saves some of the compiler structures, which should speed up compilation a lot. The private beta is free, but once the compiler is ready, it will be a commercial compiler.

Every C++ developer knows that compilation time can quickly be an issue when programs are getting very big and especially when working with template-heavy code.

To benchmark this new compiler, I use my Expression Template Library (ETL). This is a purely header-only library with lots of templates. There are lots of test cases which is what I'm going to compile. I'm going to compare against Clang-3.7 and gcc-4.9.3.

I have configured zapcc to let is use 2Go RAM per caching server, which is the maximum allowed. Moreover, I killed the servers before each tests.

Debug build

Let's start with a debug build. In that configuration, there is no optimization going on and several of the features of the library (GPU, BLAS, ...) are disabled. This is the fastest way to compile ETL. I gathered this result on a 4 core, 8 threads, Intel processor, with an SSD.

The following table presents the results with different number of threads and the difference of zapcc compared to the other compilers:

Compiler	-j1	-j2	-j4	-j6	-j8
g++-4.9.3	350s	185s	104s	94s	91s
clang++-3.7	513s	271s	153s	145s	138s
zapcc++	158s	87s	47s	44s	42s
Speedup VS Clang	3.24	3.103	3.25	3.29	3.28
Speedup VS GCC	2.21	2.12	2.21	2.13	2.16

The result is pretty clear! zapcc is around three times faster than Clang and around two times faster than GCC. This is pretty impressive!

For those that think than Clang is always faster than GCC, keep in mind that this is not the case for template-heavy code such as this library. In all my tests, Clang has always been slower and much memory hungrier than GCC on template-heavy C++ code. And sometimes the difference is very significant.

Interestingly, we can also see that going past the physical cores is not really interesting on this computer. On some computer, the speedups are interesting, but not on this one. Always benchmark!

Release build

We have seen the results on a debug build, let's now compare on something a bit more timely, a release build with all options of ETL enabled (GPU, BLAS, ...), which should make it significantly longer to compile.

Again, the table:

Compiler	-j1	-j2	-j4	-j6	-j8
g++-4.9.3	628s	336s	197s	189s	184s
clang++-3.7	663s	388s	215s	212s	205s
zapcc++	515s	281s	173s	168s	158s
Speedup VS Clang	1.28	1.38	1.24	1.26	1.29
Speedup VS GCC	1.21	1.30	1.13	1.12	1.16

This time, we can see that the difference is much lower. Zapcc is between 1.2 and 1.4 times faster than Clang and between 1.1 and 1.3 times faster than GCC. This shows that most of the speedups from zapcc are in the front end of the compiler. This is not a lot but still significant over long builds, especially if you have few threads where the absolute difference would be higher.

We can also observe that Clang is now almost on par with GCC which shows that optimization is faster in Clang while front and backend is faster in gcc.

You also have to keep in mind that zapcc memory usage is higher than Clang because of all the caching. Moreover, the server are still up in between compilations, so this memory usage stays between builds, which may not be what you want.

As for runtime, I have not seen any significant difference in performance between the clang version and the zapcc. According to the official benchmarks and documentation, there should not be any difference in that between zapcc and the version of clang on which zapcc is based.

Incremental build

Normally, zapcc should shine at incremental building, but I was unable to show any speedup when changing a single without killing the zapcc servers. Maybe I did something wrong in my usage of zapcc.

Conclusion

In conclusion, we can see that zapcc is always faster than both GCC and Clang, on my template-heavy library. Moreover, on debug builds, it is much faster than any of the two compilers, being more than 2 times faster than GCC and more than 3 times faster than clang. This is really great. Moreover, I have not seen any issue with the tool so far, it can seamlessly replace Clang without problem.

It's a bit weird that you cannot allocate more than 2Go to the zapcc servers.

For a program, that's really impressive. I hope that they are continuing the good work and especially that this motivates other compilers to improve the speed of compilation (especially of templates).

If you want more information, you can go to the official website of zapcc

Blazing fast unit test compilation with doctest 1.1

Baptiste Wicht

2016-09-21 21:45

Comments

You may remember my quest for faster compilation times. I had made several changes to the Catch test framework macros in order to save some compilation at the expense of my test code looking a bit less nice:

REQUIRE(a == 9); //Before
REQUIRE_EQUALS(a, 9); //After

The first line is a little bit better, but using several optimizations, I was able to dramatically change the compilation time of the test cases of ETL. In the end, I don't think that the difference between the two lines justifies the high overhead in compilation times.

doctest

doctest is a framework quite similar to Catch but that claims to be much lighter. I tested doctest 1.0 early on, but at this point it was actually slower than Catch and especially slower than my versions of the macro.

Today, doctest 1.1 was released with promises of being even lighter than before and providing several new ways of speeding up compilation. If you want the results directly, you can take a look at the next section.

First of all, this new version improved the basic macros to make expression decomposition faster. When you use the standard REQUIRE macro, the expression is composed by using several template techniques and operator overloading. This is really slow to compile. By removing the need for this decomposition, the fast Catch macros are much faster to compile.

Moreover, doctest 1.1 also introduces CHECK_EQ that does not any expression decomposition. This is close to what I did in my macros expect that it is directly integrated into the framework and preserves all its features. It is also possible to bypass the expression checking code by using FAST_CHECK_EQ macro. In that case, the exceptions are not captured. Finally, a new configuration option is introduced (DOCTEST_CONFIG_SUPER_FAST_ASSERTS) that removes some features related to automatic debugger breaks. Since I don't use the debugger features and I don't need to capture exception everywhere (it's sufficient for me that the test fails completely if an exception is thrown), I'm more than eager to use these new features.

Results

For evaluation, I have compiled the complete test suite of ETL, with 1 thread, using gcc 4.9.3 with various different options, starting from Catch to doctest 1.1 with all compilation time features. Here are the results, in seconds:

Version	Time	VS Catch	VS Fast Catch	VS doctest 1.0
Catch	724.22
Fast Catch	464.52	-36%
doctest 1.0	871.54	+20%	+87%
doctest 1.1	614.67	-16%	+32%	-30%
REQUIRE_EQ	493.97	-32%	+6%	-43%
FAST_REQUIRE_EQ	439.09	-39%	-6%	-50%
SUPER_FAST_ASSERTS	411.11	-43%	-12%	-53%

As you can see, doctest 1.1 is much faster to compile than doctest 1.0! This is really great news. Moreover, it is already 16% faster than Catch. When all the features are used, doctest is 12% faster than my stripped down versions of Catch macros (and 43% faster than Catch standard macros). This is really cool! It means that I don't have to do any change in the code (no need to strip macros myself) and I can gain a lot of compilation time compared to the bare Catch framework.

I really think the author of doctest did a great job with the new version. Although this was not of as much interest for me, there are also a lot of other changes in the new version. You can consult the changelog if you want more information.

Conclusion

Overall, doctest 1.1 is much faster to compile than doctest 1.0. Moreover, it offers very fast macros for test assertions that are much faster to compile than Catch versions and even faster than the versions I created myself to reduce compilation time. I really thing this is a great advance for doctest. When compiling with all the optimizations, doctest 1.1 saves me 50 seconds in compilation time compared to the fast version of Catch macro and more than 5 minutes compared to the standard version of Catch macros.

I'll probably start using doctest on my development machine. For now, I'll keep Catch as well since I need it to generate the unit test reports in XML format for Sonarqube. Once this feature appears in doctest, I'll probably drop Catch from ETL and DLL

If you need blazing fast compilation times for your unit tests, doctest 1.1 is probably the way to go.

Short review of Bullseye Coverage

Baptiste Wicht

2016-09-16 13:25

Comments

Bullseye is a commercial Code Coverage analyzer. It is fully-featured with an export to HTML, to XML and even a specific GUI to see the application.It costs about 800$, with a renewal fee of about 200$ per year.

I'm currently using gcov and passing the results to Sonar. This works well, but there are several problems. First, I need to use gcovr to generate the XML file, that means two tools. Then, gcov has no way to merge coverage reports. In my tests of ETL, I have seven different profiles being tested and I need the overall coverage report. lcov has a merge feature but it is slow as hell (it takes longer to merge the coverage files than to compile and run the complete test suite seven times...). For now, I'm using a C++ program that I wrote to combine the XML files or a Python script that does that, but neither are perfect and it needs maintenance. Finally, it's impossible to exclude some code from the coverage report (there is code that isn't meant to be executed (exceptional code)). For now, I'm using yet another C++ program that I wrote to do this from comments in code.

Bullseye does have all these feature, so I got an evaluation license online and tried this tool and wrote a short review of it.

Usage

The usage is pretty simple. You put the coverage executables in your PATH variable and activate coverage globally. Then, we you compile, the compiler calls will be intercepted and a coverage file will be generated. When the compilation is done, run the program and the coverage measurements will be filled.

The coverage results can then be exported to HTML (or XML) or visualized using the CoverageBrowser tool:

Screenshot of Bullseye main coverage view — The main view of the Bullseye tool code coverage results

It's a pretty good view of the coverage result. You have a breakdown by folders, by file, by function and finally by condition. You can view directly the source code:

Screenshot of Bullseye source code coverage view — The source view of the Bullseye tool code coverage results

If you want to exclude some code from your coverage reports, you can use a pragma:

switch (n) {
    case 1: one++; break;
    case 2: two++; break;
    case 3: three++; break;
    #pragma BullseyeCoverage off
    default: abort();
    #pragma BullseyeCoverage on
}

So that the condition won't be set as uncovered.

As for the coverage, it's pretty straightforward. For example:

covmerge -c -ffinal.cov sse.cov avx.cov

and it's really fast. Unfortunately, the merging is only done at the function level, not at the statement or at the condition level. This is a bit disappointing, especially from a commercial tool. Nevertheless, it works well.

Conclusion

To conclude, Bullseye seems to be a pretty good tool. It has more features than standard gcov coverage and all features are well integrated together. I have only covered the features I was interested in, there are plenty of other things you can look at on the official website.

However, if you don't need the extra features such as the visualizer (or use something like Sonar for this), or the merge or code excluding, it's probably not worth paying the price for it. In my case, since the merge is not better than my C++ tool (both do almost the same and my tool does some basic line coverage merging as well) and I don't need the visualizer, I won't pay the price for it. Moreover, they don't have student or open source licensing, therefore, I'll continue with my complicated toolchain :)

Expression Templates Library (ETL) 1.0

Baptiste Wicht

2016-09-02 16:12

Comments

I've just released the first official version of my Expression Templates Library (ETL for short): The version 1.0.

Until now, I was using a simple rolling release model, but I think it's now time to switch to some basic versioning. The project is now at a stable state.

ETL 1.0 has the following main features:

Smart Expression Templates
Matrix and vector (runtime-sized and compile-time-sized)
Simple element-wise operations
Reductions (sum, mean, max, ...)
Unary operations (sigmoid, log, exp, abs, ...)
Matrix multiplication
Convolution (1D and 2D and higher variations)
Max Pooling
Fast Fourrier Transform
Use of SSE/AVX to speed up operations
Use of BLAS/MKL/CUBLAS/CUFFT/CUDNN libraries to speed up operations
Symmetric matrix adapter (experimental)
Sparse matrix (experimental)

Examples

Here is an example of expressions in ETL:

etl::fast_matrix<float, 2, 2, 2> a = {1.1, 2.0, 5.0, 1.0, 1.1, 2.0, 5.0, 1.0};
etl::fast_matrix<float, 2, 2, 2> b = {2.5, -3.0, 4.0, 1.0, 2.5, -3.0, 4.0, 1.0};
etl::fast_matrix<float, 2, 2, 2> c = {2.2, 3.0, 3.5, 1.0, 2.2, 3.0, 3.5, 1.0};

etl::fast_matrix<float, 2, 2, 2> d(2.5 * ((a >> b) / (log(a) >> abs(c))) / (1.5 * scale(a, sign(b)) / c) + 2.111 / log(c));

Or another I'm using in my neural networks library:

h = etl::sigmoid(b + v * w)

In that case, the vector-matrix multiplication will be executed using a BLAS kernel (if ETL is configured correclty) and the assignment, the sigmoid and the addition will be automatically vectorized to use either AVX or SSE depending on the machine.

Or with a convolutional layer and a ReLU activation function:

etl::reshape<1, K, NH1, NH2>(h_a) = etl::conv_4d_valid_flipped(etl::reshape<1, NC, NV1, NV2>(v_a), w);
h = max(b_rep + h_a, 0.0);

This will automatically be computed either with NVIDIA CUDNN (if available) or with optimized SSE/AVX kernels.

For more information, you can take a look at the Reference on the wiki.

Next version

For the next version, I'll focus on several things:

Improve matrix-matrix multiplication kernels when BLAS is not available. There is a lot of room for improvement here
Complete support for symmetric matrices (currently experimental)
Maybe some new adapters such as Hermitian matrices
GPU improvements for some operations that can be done entirely on GPU
New convolution performanceimprovements
Perhaps more complete parallel support for some implementations
Drop some compiler support to use full C++14 support

Download ETL

You can download ETL on Github. If you only interested in the 1.0 version, you can look at the Releases pages or clone the tag 1.0. There are several branches:

master Is the eternal development branch, may not always be stable
stable Is a branch always pointing to the last tag, no development here

For the future release, there always will tags pointing to the corresponding commits. I'm not following the git flow way, I'd rather try to have a more linear history with one eternal development branch, rather than an useless develop branch or a load of other branches for releases.

Don't hesitate to comment this post if you have any comment on this library or any question. You can also open an Issue on Github if you have a problem using this library or propose a Pull Request if you have any contribution you'd like to make to the library.

Hope this may be useful to some of you :)

Asgard: Home Automation project

Baptiste Wicht

2016-08-27 22:28

Comments

I have updated my asgard project to make it finally useful for me, so I figured I'd present the project now.

Asgard is my project of home automation based on a Raspberry Pi. I started this project after Ninja Blocks kickstarter company went down and I was left with useless sensors. So I figured why not have fun creating my own :P I know there are some other projects out there that are pretty good, but I wanted to do some more low level stuff for once, so what the hell.

Of course, everything is written in C++, no surprise here. The project is built upon a server / drivers architecture. The drivers and the server are talking via network sockets, so they can be on different machines. The server is displaying the data it got on a web interface and also provide a way to trigger actions of drivers either from the web interface or through the integrated rules engine. The data are stored in a database, accessed with CPPSqlite3 (probably going to be replaced by sqlcpp11) and the web server is handled with mongoose (with a c++ interface).

I must mention that most of the web part of the project was made by a student of mine, Stéphane Ly, who work on it as part of his study.

Here is a picture of the Raspberry Pi system (not very pretty ;) ):

I plan to try to fit at least some of it on a nicer box with nicer cables and such. Moreover, I also plan to add real antennas to the RF transmitter and receiver, but I haven't received them so far.

Sensors

asgard support several sensors:

DHT11 Temperature/Humdity Sensor
WT450 Temperature/Humdity Sensor
RF Button
IR Remote
CPU Temperature Sensor

You can see the sensors data displayed on the web interface:

Actions

There are currently a few actions provided by the drivers:

Wake-On-Lan a computer by its MAC Address
ITT-1500 smart plugs ON and OFF
Kodi actions: Pause / Play / Next / Previous on Kodi

Here are the rules engine:

My home automation

I'm currently using this system to monitor the temperature in my appartment. Nothing great so far because I don't have enough sensors yet. And now, I'm also using a wireless button to turn on my power socket, wait 2 seconds and then power on my Kodi Home Theater with wake on lan.

It's nothing fancy so far, but it's already better than what I had with Ninja Blocks, except for the ugly hardware ;).

Future

There are still tons of work on the project and on integration in my home.

I'm really dissatisfied with the WT450 sensor, I've ordered new Oregon sensors to try to do better.
I've ordered a few new sensors: Door intrusion detector and motion detector
The rules system needs to be improve to support multiple conditions
I plan to add a simple state system to the asgard server
There are a lot of refactorings necessary in the code and

However, I don't know when I'll work on this again, my work on this project is pretty episodic to say the least.

Code

The code is, as always, available on Github. There are multiple repositories: all asgard repositories. It's not that much code for now, about 2000 lines of code, but some of it may be useful. If you plan to use the system, keep in mind that it was never tested out of my environment and that there is no documentation so far, but don't hesitate to open Issues on Github if you have questions or post a comment here.

Update: Thor, Thesis and Publications

Baptiste Wicht

2016-08-23 07:40

Comments

Since it's been a real while since the last post I've written here, I wanted to write a short status update.

I had to serve one month in the army, which does not help at all for productivity :P Since the update to Boost Spirit X3, I haven't worked on my eddic compiler again, but I've switched back to my operating system project: thor. I'm having a lot of fun with it again and it's in much better state than before.

We also have been very productive on the publication side, with four new publications this year in various conferences. I'll update the blog when the proceedings are published. I'll be going to ICANN 2016 and ANNPR 2016 next week and probably to ICFHR in October. And of course, I'll go back to Meeting C++ in November :) As for my thesis, it's finally going great, I've started writing regularly and it's taking form!

Thor

My project Thor Operating System now has much more features than before:

64bit operating system
Preemptive Multiprocessing
Keyboard / Mouse driver
Full ACPI support with ACPICA
Read/Write ATA driver
FAT32 file system support
HPET/RTC/PIT drivers
Basic PCI support
Multi stage booting with FAT32

Since last time, I've fixed tons of bug in the system. Although there are still some culprit, it's much more stable than before. They were a lot of bugs in the scheduler with loads of race conditions. I hope I've working through most of them now.

I'm currently working on the network stack. I'm able to receive and send packets using the Realtek 8139 card. I have working support for Ethernet, IP and ARP. I'm currently working on adding ICMP support. I've come to realize that the hardest part is not to develop the code here but to find a way to test it. Network in Qemu is a huge pain in the ass to configure. And then, you need tools to generate some packets or at least answer to packets send by the virtual machine, and it's really bad... Nevertheless, it's pretty fun overall :)

Aside from this, I'm also working on a window manager. I'll try to post an update on this.

You can take a look at the thor sources if you're interested.

Future

For the time being, I'll focus my effort on the thor project. I also have some development to do on my home automation system: asgard-server that I plan to finalize and deploy in a useful way this weekend in my apartment. You can also expect some updates on my deep learning library where I've started work to make it more user-friendly (kind of). I'm also still waiting on the first stable version of doctest for a new comparison with Catch.

I really want to try to publish again some more posts on the blog. I'll especially try to publish some more updates about Thor.