Skip to main content

Publications - Sudoku Recognition with Deep Belief Network
   Posted:


I recently realized that I never talked about my publications on this website... I thought it was time to start. I'll start to write a few posts about my earlier publications and then I'll try to write something for the new ones not too late.

For the story, I'm currently a PHD student at the University of Fribourg, in Switzerland. My PHD is about the use of Deep Learning technologies to automatically extract features from images. I have developed my Deep Learning Library (DLL) project for this thesis. We have published a few articles on the various projects that we tackled during the thesis. I'll try to go in order.

At the beginning of the thesis, I used Restricted Boltzmann Machine and Deep Belief Network to perform digit recognition on images of Sudoku taken with a phone camera. We published two papers on this subject.

The Sudoku grid and digits are detected using standard image processing techniques:

  1. The image is first converted to grayscale, then a median blur is applied to remove noise and the image is binarized using Adapteive Thresholding
  2. The edges are detected using the Canny algorithm. From these, the lines are detected using a Progressive Probabilistic Hough Transform
  3. Using a connected component analysis, the segments of lines are clustered together to detect the Sudoku Grid
  4. The cells are then detected inside the grid using the inner lines and contour detection is used to isolate the digits.

Here is one of the original images from our dataset:

Original image from our dataset

Here are the detected characters from the previous image:

Detected digits from our application

Once all the digits have been found they are passed to a Deep Belief Network for recognition. A Deep Belief Network is composed of several Restricted Boltzmann Machines (RBM) that are stacked. The network is pretrained, by training each RBM, in turn, with Contrastive Divergence. This algorithm basically trains each RBM as an auto-encoder and learns a good feature representation of the inputs. Once all the layers have been trained, the network can then be trained as a regular neural network with Stochastic Gradient Descent.

In the second paper, the images of Sudoku are containing both computer printed and handwritten digits (the grid is already filled). The other difference is that the second system used a Convolutional DBN instead of DBN. The difference being that each layer is a Convolutional RBM. Such a model will learn a set of small filters that will be applied to each position of the image.

On the second version of the dataset, we have been able to achieve 99.14% of recognition of the digits or 92.5% of fully-recognized grid with the Convolutional Network.

You can find the C++ implementation on Github.

If you want to have a look, I've updated the list of my publications on this website.

If you want more details on this project, don't hesitate to ask here or on Github, or read the paper :) The next post about my publications will probably be about CPU performances!

Comments

Add Milight smart lights to my Domoticz home automation system
   Posted:


Recently, I switched from my hand crafted home automation system to Domoticz. This allows me to easily integrate new smart devices and remote controllable peripherals without much effort. I plan to relate my effort in having fun controlling my home :)

I'm now able to control lights in two rooms with Domoticz. The most well-known smart bulbs are the Philips Hue. However, they are stupidly expensive. There are a lot of alternatives. I've ordered some Milight light bulbs and controller to test with Domoticz. I didn't order a lot of them because I wanted to make sure they would work with my system. Milight system is working over Wifi. There are several components to a Milight system:

  • The LED Light Bulbs with Red/Green/Blue/White channels
  • The Wifi Controller that is able to control 4 zones
  • An RGBW Controller for LED strip

The first two are necessary for any installation, the third is to control an RGBW LED strip. This list is not exhaustive, it's only the components that I have used. It is important to note that a single Wifi controller can only control 4 zones. There are also remotes, but I have not bought one since I plan to use them only from Domoticz and maybe smartphone.

The installation of the controller is relatively easy. You need to download the Milight 2.0 application on the Android Play Store (or the equivalent for IOS). Then, you can power on the Wifi Controller. It'll create a new Wifi on which you can then connect on your phone. Then, you can use the application on your phone to configure the device and make it connect to your home wireless network. Once, this is done, you can connect your phone back to your home network. You can then use the Milight application to configure your device. I highly recommend to set a static IP address to the controller. The way I did it is simply to set a fixed IP address on my DHCP server based on the MAC address of the MAC controller but you can simply do the configuration in the application or in the web interface of the controller (the user and password combination is admin:admin).

Here is the look of the controller (next to my RFLink):

Milight wifi controller

(My phone is starting to die, hence the very bad quality of images)

You can then install your LED light bulbs. For, open first the remote on your Android Milight application. Then plug the light bulb without power first. Then power on the light and press once on one of the I buttons on one of the zones on the remote. This will link the LED to the selected zone on the controller. You can then control the light from your phone. Remember, only four zones and therefore four lights per controller.

The installation for a LED strip is not much complicated. You need to plug the 4 wires (or 5 wires if your have an actual RGBW LED) into the corresponding inputs on the controller. Then, you power it and you can link it to a zone like a normal light bulb!

LEDS in my Living Room controlled by Milight

It works really well and directly without problems.

The last step is of course to configure your controller in Domoticz. It is really easy to do. You need to add a new hardware of each of the Milight controller. It is listed under the name "Limitless/AppLamp/Mi Light with LAN/WiFi interface". You then can set the IP address and the port by default is 8899. Once you did configure the hardware, you'll see new devices appear in the Devices list. There will one device for each zone and one device to control all four zones at once. You can add the device you already configured as switches. From the Switches interface you can turn the lamp on and off and you can

Domoticz Light Bulbs control

You can then put them on your floor plan or control them from your rules.

So far, I'm pretty satisfied with this Milight system. The Android application is of poor quality but aside from this is pretty good and the price is very fair. I'm also really satisfied with the Domoticz support. The only that is sad is that the Domoticz Android application does not support RGBW control of the lamps, only on and off, but that is already cool.

Now that all of this is working well, I've ordered a few more light bulbs to cover all my rooms and a few LED controller to control (and install first) the other LEDS that I have in mind.

On another note, I've also added a new outside temperature sensor outside my apartment. It is a very cheap Chinese sensor, bought on Ebay, based on the XT200 system that is working very well with RFLink.

The next step in my system is probably to integrate Voice Control, but I don't know exactly which way I'll go. I ordered a simple microphone that I intend to plug on my spare Raspberry Pi, but I don't know if the range will be enough to cover a room. Ideally, I would like to use an Amazon Dot, but they are not available in Switzerland. I'll probably write more on the subject once I've found an adequate solution. Another idea I've is to integrate support for ZWave via OpenZWave and then add a few more cool sensors that I haven't found on an cheaper system.

I hope this is interesting and don't hesitate if you have any question about my home automation project. You can expect a few more posts about this as soon as In improve it :)

Comments

Vivaldi + Vimium = Finally no more Firefox!
   Posted:


I've been using the Pentadactyl Firefox extension for a long time. This extensions "vimifies" Firefox and it does a very good job of it. This is probably the best extension I have ever seen on any browser. This post is really not against Pentadactyl, this is a great addon and it still works great.

However, I have been more and more dissatisfied of Mozilla Firefox over the years. Indeed, the browser is becoming slower and slower all the time and I'm experiencing more and more issues on Gentoo with it. But the biggest problem I've with Firefox right now is the philosophy of the developers that is really crappy. Currently, there is only one thing that is good in Firefox compared to the other browsers, its extensions. Basically, an extension in Firefox can do almost anything. Pentadactyl is able to transform most of the interface and get rid of all of the useless parts of the interface. It is currently impossible to do so in other browsers. These powerful addons are using the XUL/XPCOM programming interface to do so. Pentadactyl is the only reason I've kept to Firefox so long. But Firefox has announced, already more than a year ago, that it will deprecate its XUL/XPCOM interface in favour of webextensions. This means that a lot of very good addons will not be able to work anymore once the deprecation has been completed. Several writers of popular Firefox have announced that they will not even try to port their addons and some addons will simply not be possible anymore. This is the case for Pentadactyl which is on the line for when the deprecation occurs. The data for deprecated has already been delayed but is likely to come anyway.

For several months, I've been looking at several possible replacements for my current Pentadactyl browser. I've tried qutebrowser, but it is really too limited in terms of features so far. I've also tried again Chromium which is a great browser but unfortunately, there are very few possibilities for addons to modify the interface. Vimium is a great addon for Chromium which is basically the very much more lightweight alternative to Pentadactyl. It has much less features, but most of the missing features are simply things that cannot be done in Chromium.

Only recently did I test Vivaldi. Vivaldi is a free multi-platform browser, based on Chromium and supporting Chromium extensions. The major difference with Chrome is how the UI is customizable, due to the use of a dynamic UI, stylable with CSS. With the customizability of Vivaldi plus the great shortcuts and vim-like behaviour of vimium, I really feel like I found a new Pentadactyl with the advantage of not having to bear Firefox!

Here is how it is looking with the follow URLs feature from vimium:

View of my Vivaldi browser

Note: The gray bar on the left is the console to the left and the top kind of bar is awesome wm, they are not part of the browser.

I'm using the dark theme with native windows. I've disabled the address bar, moved the tab bar to the bottom and completely hidden the side panel. All that remained was the title bar and the scroll bar.

To get rid of the title bar, you can use CSS. First, you have to only display the Vivaldi button in the settings page. Then, you can use this custom CSS:

button.vivaldi {
    display: none !important;
}

#header {
    min-height: 0 !important;
    z-index: auto !important;
}

.button-toolbar.home { display: none }

to hide the title completely! To get rid of the scroll bar, you need to use the Stylish extension and use this custom CSS:

::-webkit-scrollbar{display:none !important; width:0px;}
::-webkit-scrollbar-button,::-webkit-scrollbar-track{display:none !important;}
::-webkit-scrollbar-thumb{display: none !important;}
::-webkit-scrollbar-track{display: none !important;}

And then, no more scroll bar :)

If you want to have full HTML5 video support, you need to install extra codecs. On Gentoo, I've uploaded a ebuild on my overlay (wichtounet on layman) with the name vivaldi-ffmpeg-codecs and everything should be working fine :)

Vimium is clearly inferior to Pentadactyl in that for instance it only works in web page, not in system page and you still have to use the browser for a few things, but it does not seem too bar so far. Moreover, I wasn't using all the features of Pentadactyl. I haven't been used this browser for a long time, so maybe there are things that I will miss from Pentadactyl, but I won't certainly miss Firefox!

Comments

New home automation system with Domoticz
   Posted:


As you may remember, I started going into Home Automation last year with a C++ project, Asgard.

Unfortunately, I haven't add the time to improve this project further and make it really useful and I have so many C++ projects already, so I decided to switch to using an existing project.

Choice

I needed a system that was at least able to handle everything my existing system does:

  1. Control my RF-433 Temperature Sensors (Oregon Temperature Sensors)
  2. Control my RF-433 Remote Button
  3. Control my smart power outlets
  4. Ping and wake-on-lan my different computers
  5. Be installable on a Raspberry Pi
  6. Control Skadi/Kodi Home Cinema

Of course, it also had to be open source and in a language that I can write without agonizing pain (no Ruby, no JavaScript, ideally no Python (I hate a lot of languages...)).

It's impressive the number of home automation projects that exist today. It's really a hot topic, more than I though. After a few hours of research, I've finally settled on Domoticz. It seemed that it could handle all my home automation devices that I already had. Domoticz is an open source home automation system. It's written mainly in C++, which is an advantage for me since I'll be able to work inside it if necessary. It has a quite modern HTML5 interface. It has a lot of support for hardware and a lot of bridges for other controllers as well. One of the reason I chose it is also the floor-planner feature that looks really cool.

Installation

First of all, I cleaned the hardware of my Raspberry Pi. I got rid of my breadboard and the components on it. Then, I put the Raspberry Pi on a nicer box that I had laying around. One problem with Domoticz is that it does not support directly RF433 devices, but need a controller in the middle. At first, I believed I could simply connect them through GPIO and lets Domoticz do the rest, but it's not possible. I ordered the different pieces for a RFLink controller at nodo-shops. It is running on a Arduino Mega and is connected to the Pi. The very big advantage is that they handle a lot of protocols directly, so you only have to read the serial port to get information on RF-433 events.

Here is how the current system look:

Asgard automation system hardware

The black box is the Arduino Mega with RFLink software while the Transparent is simply the Raspberry on which Domoticz is installed.

As for the installation of Domoticz, it's really simple. They also have images of Raspberry Pi directly, but since my Raspberry was already installed with Raspbian, I installed Domoticz on it directly. A simple matter of curl and bash:

sudo curl -L install.domoticz.com | sudo bash

After a few questions, your installation should be complete. You can browse your Domoticz installation at the IP of your Pi and at the given port.

Home Automation with Domoticz

There are several concepts in Domoticz that are important. At first, it seemed a bit weird to me, but in the end it works really well and you get used to it pretty fast.

The first concept is the hardware. It is not necessary hardware directly, but rather a driver to some piece of information of switches than a real hardware in my opinion. For instance, the RFLink Gateway is considered hardware as well as the Forecast weather API. Here is all the hardware I've configured:

Domoticz Hardware Configuration

I've got some sensors from the motherboard, the Kodi interaction, the driver to ping the different computers, the Forecast weather with Dark Sky and Weather Underground, the RFLink communication, the wake-on-lan and some dummies virtual switches.

On its own, an hardware is useless, but it can create devices that will be used for home automation. A device is sub device of a particular driver. In some cases, the devices will be created automatically. For instance, RF Link can create devices as soon as they are detected. For wake-on-lan, you have to configure MAC Addresses and they will be added as devices. And so on...

There are a lot of types of devices, also called switches. Some devices can do some actions, for instance the wake-on-lan devices, while others are information based such as ping or temperature sensors detected from the RFLink. For instance, here are my temperature sensors:

Domoticz Temperature Sensors

As for my hardware I already had, I haven't add too many issues with them. The Oregon Temperature Sensors worked out of the box without any issues as well my old RF433 Ninja Block temperature sensor. The smart power outlets have been pretty easy to use as well. The most problem I've add was with the remote buttons. It seems to many that it should be the easiest, but RFLink has a lot of problem with them, they are detected several times and are highly unreliable. Interestingly you can also change the icon of most device to a large variety of existing devices so that they would look more realistic.

Another feature I find pretty cool is the floor plan. You can create plans for each floor of your house and then create rooms inside and finally attach devices to each room and place inside the plan. Unfortunately, to draw the plan, you'll have to use an external tool. I've used floorplanner for this, but you may use any plan you have as long as you have an image.

Here is my floor plan with the devices (the scale is terribly off :P) :

Domoticz Floor Plan

With that you can directly see the status of each of your sensors in each room. Moreover, you can also activate some devices directly from the floor plan as well :)

Normally, my Somfy blinds are supposed to work with the RFLink controller. I have managed to control my blinds but only when the RFLink was within 50 centimeters of the Blinds, which is clearly not acceptable. I don't know from where the problem comes. It may come from my Blinds controller inside the wall or may come from bad antenna of the RFLink or from a bug in RFLink, so from now I cannot control my blinds.

I have several D-Link cameras at home. You can also add cameras to Domoticz, but don't expect a lot of support on this side. It's pretty poor indeed. The only useful you can do in Domoticz with Cameras is to take a screenshot from them. You cannot control the Camera or even have a decent video stream and cannot do motion detection directly. I hope that the Camera support will improve in the future. One thing that would be good is to add a view of the cameras in the floor plan, but it is not possible currently.

What I did is to use Zoneminder to manage the cameras and send alert to Domoticz when motion is detect with the REST API of Domoticz. Unfortunately that means two systems to manage the Cameras. Zoneminder is a very good piece of software, but one of the ugliest I've ever seen (even I can do a better web interface...) and quite obscure to configure. But the numbers of cameras it can handle and the capabilities to control the camera is very very powerful.

Events and actions

No home automation system would be complete without an events and rules system. In Domoticz, you have several ways to create rules.

The first solution is to create group. A group is simply a set of switches that are all activated together. This is pretty limited, but can be useful. The second solution is a scene. A scene is activated by a device and can set several switches to on or off after the activation. The problem with this is that you have to listen for an action as activation, you cannot use an existing device or a rule with a value. For me, these two possibilities are way too limited and I don't really a reason to use them.

The best solution is to script an handler. For that, there are two ways, either you define visually your script with blocky, for instance:

Domoticz Blockly Rule

This seemed so cool that I wanted to do all my rules with that. Unfortunately, it is really limited. The main limitation is that you cannot nest if blocks. In fact the interface lets you do, but it doesn't work afterwards. The other limitation is that it has no notion of time. For instance, it's very hard to create an action if a computer is shutdown for more than 60 seconds. You can do it, but you end up creating virtual devices which are turned off after some delay and it gets really complicated really soon...

In the end, I used Lua script directly. These scripts have the information about the last values of each device and the last updated time as well. It's much easier with that to create powerful rules especially when you want a notion of time. There are plenty of examples on the Domoticz forum about very powerful rules that you can create. Unfortunately, there are some limitations. For instance, you cannot send several actions to the same device in the same script Moreover, you also have to decode the values of the sensors yourself. For instance, for a temperature and humidity sensor, you'll have to parse the values and extract the one you need by hand.

So far, I haven't created many rules, only four. The first rule is that with the push of one remote button I can power on my smart socket and turn on one of my computers in my office. Then, if both the computers on this smart sockets have been off for more than 60 seconds, the power is shut off on the smart socket. And I have done almost the same for my media center and TV in my living room.

In the future, I'll probably only use the Lua scripting capability. In my opinion, it's better to have all the scripts in the same language for maintainability reasons. Once my futures devices arrive, I'll have a few more rules to code.

Conclusion

To wrap up, I'd say that I'm pretty satisfied with my new home automation system with Domoticz. It's not a perfect tool, but it has a lot of features and is working quite well in general. I would recommend you to try it if you want a good home automation system, fully featured and relatively easy to use.

I've several more things planned for this system. I plan to test a real motion sensor rather than rely only on the cameras which are focused on the doors and windows. I also plan to add an outdoor temperature sensor. And more interestingly, I'm going to try to integrate smart bulbs from Milight to be able to control my lights from the system. These are the shot term projects that I want to do, but I have many more ideas!

I'm also sure that there are some features from Domoticz that I have overlooked or not discovered.

I'll probably write more posts on Home Automation on the coming months.

For more information on Domoticz, you can consult the official Domoticz website.

Comments

PVS-Studio on C++ Library Review
   Posted:


PVS-Studio is a commercial static analyzer for C, C++ and C#. It works in both Windows and Linux.

It has been a long time since I wanted to test it on my projects. I contacted The PVS-Studio team and they gave me a temporary license so that I can test the tool and make a review.

I tried the static analyzer on my Expression Templates Library (ETL) project. This is a heavily-templated C++ library. I tried it on Linux of course.

Usage

The installation is very simple, simply untar an archive and put the executables in your PATH (or use absolute paths). There are also some deb and rpm packages for some distributions. You need strace to make the analyzer work, it should be available on any Linux platform.

The usage of PVS-Studio on Linux should be straightforward. First, you can use the analyzer directly with make and it will detect the invocations of the compiler. For instance, here is the command I used for ETL:

pvs-studio-analyzer trace -- make -j5 debug/bin/etl_test

Note that you can use several threads without issues, which is really great. There does not seem to be any slowdown at this stage, probably only collecting compiler arguments.

This first step creates a strace_out file that will be used by the next stage.

Once, the compilation has been analyzed, you can generate the results with the analyze function, for which you'll need a license. Here is what I did:

pvs-studio-analyzer analyze -l ~/pvs_studio/PVS-Studio.lic -j5

Unfortunately, this didn't work for me:

No compilation units found
Analysis finished in 0:00:00.00

Apparently, it's not able to use the strace_out it generated itself...

Another possibility is to use the compilation database from clang to use PVS-Studio. So I generated my compile_commands.json file again (it was not up to date...) with Bear. And then, you only need to run the analyze step:

pvs-studio-analyzer analyze -l ~/pvs_studio/PVS-Studio.lic -j5

Make sure you have the same compiler configured than the one used to generate the compilation database to avoid errors with compiler arguments.

Unfortunately, this just printed a load of crap on my console:

(L8Pu(]-'Lo8h>uo(_uv(uo2(->'2h_u(uo2(uvU2K h>'o8a=}Lkk;x[G^%cuaa8acr[VS%
$ckUaoc8 c'8>_-o-8>U2cu/kau==-8>c-=cU2]Uf=c U2=u%c&kU__->[email protected]%cJ
(L8Pu(]-'Lo8h>uo(_uv(uo2(->'2h_u(uo2(uvU2K h>'o8a=}Lkk;JVJ^%cuaa8acr[VS%
$ckUaoc8 c'8>_-o-8>U2cu/kau==-8>c-=cU2]Uf=c U2=u%c&kU__->[email protected]%cJ
(L8Pu(]-'Lo8h>uo(_uv(uo2(->'2h_u(uo2(uvU2K h>'o8a=}Lkk;*[G^%cuaa8acr[VS%
$ckUaoc8 c'8>_-o-8>U2cu/kau==-8>c-=cU2]Uf=c U2=u%c&kU__->[email protected]%cJ
(L8Pu(]-'Lo8h>uo(_uv(uo2(->'2h_u(uo2(uvU2K h>'o8a=}Lkk;b[b^%cuaa8acr[VS%
$ckUaoc8 c'8>_-o-8>U2cu/kau==-8>c-=cU2]Uf=c U2=u%c&kU__->[email protected]%cJ
(L8Pu(]-'Lo8h>uo(_uv(uo2(->'2h_u(uo2(uvU2K h>'o8a=}Lkk;[[x^%cuaa8acr[VS%
$ckUaoc8 c'8>_-o-8>U2cu/kau==-8>c-=cU2]Uf=c U2=u%c&kU__->[email protected]%cJ

Pretty nice, isn't it ?

Let's try again in a file:

pvs-studio-analyzer analyze -o results.log -l ~/pvs_studio/PVS-Studio.lic -j5

The time is quite reasonable for the analysis, it took much less time than the compilation time. In total, it took 88 seconds to analyze all the files. It's much faster than the clang static analyzer.

This time it worked, but the log file is not readable, you need to convert it again:

plog-converter -t errorfile -o errors results.log

And finally, you can read the results of the analysis in the errors file.

Results

Overall, PVS-Studio found 236 messages in the ETL library, I was expecting more. I also wish there was an HTML report that include the source code as well as the error message. I had to lookup at the code for each message (you could integrate it in vim and then use the quickfix window to do that). There are some visualization but in things like QtCreator or LibreOffice which I don't have nor want on my computer.

Let's look at the results. For each message, I'll include the message from PVS-Studio and the code if it's relevant.

The first is about using the comma:

include/etl/traits.hpp:674:1: error: V521 Such expressions using the ',' operator are dangerous. Make sure the expression is correct.
include/etl/traits.hpp:674:1: error: V685 Consider inspecting the return statement. The expression contains a comma.
template <typename E>
constexpr std::size_t dimensions(const E& expr) noexcept {
    return (void)expr, etl_traits<E>::dimensions();
}

Here I'm simply using the comma operand to ignore expr to avoid a warning. To make this compile in C++11, you need to do it in one line otherwise it's not a constexpr function. It's probably not perfect to use this construct, but there is no problem here.

There is a bunch of these, let's filter them, it remains 207 warnings. Let's jump to the next one:

include/etl/impl/blas/fft.hpp:29:1: error: V501 There are identical sub-expressions to the left and to the right of the '==' operator: (DFTI_SINGLE) == DFTI_SINGLE
inline void fft_kernel(const std::complex<float>* in, std::size_t s, std::complex<float>* out) {
    DFTI_DESCRIPTOR_HANDLE descriptor;

    void* in_ptr = const_cast<void*>(static_cast<const void*>(in));

    DftiCreateDescriptor(&descriptor, DFTI_SINGLE, DFTI_COMPLEX, 1, s); //Specify size and precision
    DftiSetValue(descriptor, DFTI_PLACEMENT, DFTI_NOT_INPLACE);         //Out of place FFT
    DftiCommitDescriptor(descriptor);                                   //Finalize the descriptor
    DftiComputeForward(descriptor, in_ptr, out);                        //Compute the Forward FFT
    DftiFreeDescriptor(&descriptor);                                    //Free the descriptor
}

Unfortunately, the error is inside the MKL library. Here, I really don't think it's an issue. There is pack of them. I forgot to exclude non-ETL code from the results. Once filter from all dependencies, 137 messages remain.

include/etl/eval_functors.hpp:157:1: warning: V560 A part of conditional expression is always false: !padding.

This is true, but not an issue since padding is a configuration constant that enables the use of padding in vector and matrices. There was 27 of these at different locations and with different configuration variables.

include/etl/op/sub_view.hpp:161:1: note: V688 The 'i' function argument possesses the same name as one of the class members, which can result in a confusion.

This is again true, but not a bug in this particular case. It is still helpful and I ended up changing these to avoid confusion. Again, there was a few of these.

etl/test/src/conv_multi_multi.cpp:23:1: error: V573 Uninitialized variable 'k' was used. The variable was used to initialize itself.

This one is in the test code:

for (size_t k = 0; k < etl::dim<0>(K); ++k) {
    for (size_t i = 0; i < etl::dim<0>(I); ++i) {
        C_ref(k)(i) = conv_2d_valid(I(i), K(k)); // HERE
    }
}

I don't see any error, k is initialized correctly to zero in the first loop. This is a false positive for me. There were several of these in different places. It seems to that the use of the operator() is confusing for PVS-Studio.

include/etl/traits.hpp:703:1: note: V659 Declarations of functions with 'rows' name differ in the 'const' keyword only, but the bodies of these functions have different composition. This is suspicious and can possibly be an error. Check lines: 693, 703.
template <typename E, cpp_disable_if(decay_traits<E>::is_fast)>
std::size_t rows(const E& expr) { //693
    return etl_traits<E>::dim(expr, 0);
}

template <typename E, cpp_enable_if(decay_traits<E>::is_fast)>
constexpr std::size_t rows(const E& expr) noexcept { //703
    return (void)expr, etl_traits<E>::template dim<0>();
}

Unfortunately, this is again a false positive because PVS-Studio failed to recognized SFINAE and therefore the warning is wrong.

include/etl/builder/expression_builder.hpp:345:1: note: V524 It is odd that the body of '>>=' function is fully equivalent to the body of '*=' function.

This one is interesting indeed. It is true that they are exactly because in ETL >> is used for scalar element-wise multiplication. This is quite interesting that PVS-Studio points that out. There was a few of these oddities but all were normal in the library.

etl/test/src/compare.cpp:23:1: error: V501 There are identical sub-expressions to the left and to the right of the '!=' operator: a != a

Again, it is nice that PVS-Studio finds that, but this is done on purpose on the tests to compare an object to itself. If I remove all the oddities in the test cases, there are only 17 left in the headers. None of the warnings on the test case was serious, but there was no more false positives either, so that's great.

include/etl/impl/vec/sum.hpp:92:1: error: V591 Non-void function should return a value.
template <typename L, cpp_disable_if((vec_enabled && all_vectorizable<vector_mode, L>::value))>
value_t<L> sum(const L& lhs, size_t first, size_t last) {
    cpp_unused(lhs);
    cpp_unused(first);
    cpp_unused(last);
    cpp_unreachable("vec::sum called with invalid parameters");
}

This one is interesting. It's not a false positive since indeed the function does not return a value, but there is a __builtin_unreachable() inside the function and it cannot be called. In my opinion, the static analyzer should be able to handle that, but this is really a corner case.

include/etl/sparse.hpp:148:1: note: V550 An odd precise comparison: a == 0.0. It's probably better to use a comparison with defined precision: fabs(A - B) < Epsilon.
inline bool is_zero(double a) {
    return a == 0.0;
}

This is not false, but again this is intended because of the comparison to zero for a sparse matrix. There were 10 of these in the same class.

include/etl/impl/blas/fft.hpp:562:1: note: V656 Variables 'a_padded', 'b_padded' are initialized through the call to the same function. It's probably an error or un-optimized code. Consider inspecting the 'etl::size(c)' expression. Check lines: 561, 562.
dyn_vector<etl::complex<type>> a_padded(etl::size(c));
dyn_vector<etl::complex<type>> b_padded(etl::size(c));

It's indeed constructed with the same size, but for me I don't think it's an odd pattern. I would not consider that as a warning, especially since it's a constructor and not a assignment.

include/etl/dyn_base.hpp:312:1: warning: V690 The 'dense_dyn_base' class implements a copy constructor, but lacks the '=' operator. It is dangerous to use such a class.

This is again a kind of corner case in the library because it's a base class and the assignment is different between the sub classes and not a real assignment in the C++ sense.

include/etl/impl/reduc/conv_multi.hpp:657:1: warning: V711 It is dangerous to create a local variable within a loop with a same name as a variable controlling this loop.
for (std::size_t c = 0; c < C; ++c) {
    for (std::size_t k = 0; k < K; ++k) {
        conv(k)(c) = conv_temp(c)(k);
    }
}

This is again a false positive... It really seems that PVS-Studio is not able to handle the operator().

include/etl/impl/pooling.hpp:396:1: error: V501 There are identical sub-expressions to the left and to the right of the '||' operator: P1 || P2 || P1
template <size_t C1, size_t C2, size_t C3,size_t S1, size_t S2, size_t S3, size_t P1, size_t P2, size_t P3, typename A, typename M>
static void apply(const A& sub, M&& m) {
    const size_t o1 = (etl::dim<0>(sub) - C1 + 2 * P1) / S1 + 1;
    const size_t o2 = (etl::dim<1>(sub) - C2 + 2 * P2) / S2 + 1;
    const size_t o3 = (etl::dim<2>(sub) - C3 + 2 * P3) / S3 + 1;

    if(P1 || P2 || P1){

Last but not least, this time, it's entirely true and it's in fact a bug in my code! The condition should be written like this:

if(P1 || P2 || P3){

This is now fixed in the master of ETL.

Conclusion

The installation was pretty easy, but the usage was not as easy as it could because the first method by analyzing the build system did not work. Fortunately, the system supports using the Clang compilation database directly and therefore it was possible to use.

Overall, it found 236 warnings on my code base (heavily templated library). Around 50 of them were in some of the extend libraries, but I forgot to filter them out. The quality of the results is pretty good in my opinion. It was able to find a bug in my implementation of pooling with padding. Unfortunately, there was quite a few false positives, due to SFINAE, bad handling of the operator() and no handling of __builtin_unreachable. The remaining were all correct, but were not bug considering their usages.

To conclude, I think it's a great static analyzer that is really fast compared to other one in the market. There are a few false positives, but it's really not bad compared to other tools and some of the messages are really great. An HTML report including the source code would be great as well.

If you want more information, you can consult the official site. There is even a way to use it on open-source code for free, but you have to add comments on top of each of your files.

I hope it was helpful ;)

Comments

C++ Compiler benchmark on Expression Templates Library (ETL)
   Posted:


In my Expression Templates Library (ETL) project, I have a lot of template heavy code that needs to run as fast as possible and that is quite intensive to compile. In this post, I'm going to compare the performance of a few of the kernels produced by different compilers. I've got GCC 5.4, GCC 6.20 and clang 3.9. I also included zapcc which is based on clang 4.0.

These tests have been run on an Haswell processor. The automatic parallelization of ETL has been turned off for these tests.

Keep in mind that some of the diagrams are presented in logarithmic form.

Vector multiplication

The first kernel is a very simple one, simple element-wise multiplication of two vectors. Nothing fancy here.

For small vectors, clang is significantly slower than gcc-5.4 and gcc6.2. On vectors from 100'000 elements, the speed is comparable for each compiler, depending on the memory bandwidth. Overall, gcc-6.2 produces the fastest code here. clang-4.0 is slightly slower than clang-3.9, but nothing dramatic.

Vector exponentiation

The second kernel is computing the exponentials of each elements of a vector and storing them in another vector.

Interestingly, this time, clang versions are significantly faster for medium to large vectors, from 1000 elements and higher, by about 5%. There is no significant differences between the different versions of each compiler.

Matrix-Matrix Multiplication

The next kernel I did benchmark with the matrix-matrix multiplication operation. In that case, the kernel is hand-unrolled and vectorized.

There are few differences between the compilers. The first thing is that for some sizes such as 80x80 and 100x100, clang is significantly faster than GCC, by more than 10%. The other interesting fact is that for large matrices zapcc-clang-4.0 is always slower than clang-3.9 which is itself on par with the two GCC versions. In my opinion, it comes from a regression in clang trunk but it could also come from zapcc itself.

The results are much more interesting here! First, there is a huge regression in clang-4.0 (or in zapcc for that matter). Indeed, it is up to 6 times slower than clang-3.9. Moreover, the clang-3.9 is always significantly faster than gcc-6.2. Finally, there is a small improvement in gcc-6.2 compared to gcc 5.4.

Fast-Fourrier Transform

The following kernel is the performance of a hand-crafted Fast-Fourrier transform implementation.

On this benchmark, gcc-6.2 is the clear winner. It is significantly faster than clang-3.9 and clang-4.0. Moreover, gcc-6.2 is also faster than gcc-5.4. On the contrary, clang-4.0 is significantly slower than clang-3.9 except on one configuration (10000 elements).

1D Convolution

This kernel is about computing the 1D valid convolution of two vectors.

While clang-4.0 is faster than clang-3.9, it is still slightly slower than both gcc versions. On the GCC side, there is not a lot of difference except on the 1000x500 on which gcc-6.2 is 25% faster.

And here are the results with the naive implementation:

Again, on the naive version, clang is much faster than GCC on the naive, by about 65%. This is a really large speedup.

2D Convolution

This next kernel is computing the 2D valid convolution of two matrices

There is no clear difference between the compilers in this code. Every compiler here has up and down.

Let's look at the naive implementation of the 2D convolution (units are milliseconds here not microseconds):

This time the difference is very large! Indeed, clang versions are about 60% faster than the GCC versions! This is really impressive. Even though this does not comes close to the optimized. It seems the vectorizer of clang is much more efficient than the one from GCC.

4D Convolution

The final kernel that I'm testing is the batched 4D convolutions that is used a lot in Deep Learning. This is not really a 4D convolution, but a large number of 2D convolutions applied on 4D tensors.

Again, there are very small differences between each version. The best versions are the most recent versions of the compiler gcc-6.2 and clang-4.0 on a tie.

Conclusion

Overall, we can see two trends in these results. First, when working with highly-optimized code, the choice of compiler will not make a huge difference. On these kind of kernels, gcc-6.2 tend to perform faster than the other compilers, but only by a very slight margin, except in some cases. On the other hand, when working with naive implementations, clang versions really did perform much better than GCC. The clang compiled versions of the 1D and 2D convolutions are more than 60% faster than their GCC counter parts. This is really impressive. Overall, clang-4.0 seems to have several performance regressions, but since it's not still a work in progress, I would not be suprised if these regressions are not present in the final version. Since the clang-4.0 version is in fact the clang version used by zapcc, it's also possible that zapcc is introducing new performance regressions.

Overall, my advice would be to use GCC-6.2 (or 5.4) on hand-optimized kernels and clang when you have mostly naive implementations. However, keep in mind that at least for the example shown here, the naive version optimized by the compiler never comes close to the highly-optimized version.

As ever, takes this with a grain of salt, it's only been tested on one project and one machine, you may obtain very different results on other projects and on other processors.

Comments

zapcc C++ compilation speed against gcc 5.4 and clang 3.9
   Posted:


A week ago, I compared the compilation time performance of zapcc against gcc-4.9.3 and clang-3.7. On debug builds, zapcc was about 2 times faster than gcc and 3 times faster than clang. In this post, I'm going to try some more recent compilers, namely gcc 5.4 and clang 3.9 on the same project. If you want more information on zapcc, read the previous posts, this post will concentrate on results.

Again, I use my Expression Template Library (ETL). This is a purely header-only library with lots of templates. I'm going to compile the full test cases.

The results of the two articles are not directly comparable, since they were obtained on two different computers. The one on which the present results are done has a less powerful and only 16Go of RAM compared to the 32Go of RAM of my build machine. Also take into account that that the present results were obtained on a Desktop machine, there can be some perturbations from background tasks.

Just like on the previous results, it does not help using more threads than physical cores, therefore, the results were only computed on up to 4 cores on this machine.

The link time is not taken into account on the results.

Debug build

Let's start with the result of the debug build.

Compiler -j1 -j2 -j4
g++-5.4.0 469s 230s 130s
clang++-3.9 710s 371s 218s
zapcc++ 214s 112s 66s
Speedup VS Clang 3.31 3.31 3.3
Speedup VS GCC 2.19 2.05 1.96

The results are almost the same as the previous test. zapcc is 3.3 times faster to compile than Clang and around 2 times faster than GCC. It seems that GCC 5.4 is a bit faster than GCC 4.9.3 while clang 3.9 is a bit slower than clang 3.7, but nothing terribly significant.

Overall, for debug builds, zapcc can bring a very significant improvement to your compile times.

Release build

Let's see what is the status of Release builds. Since the results are comparable between the numbers of threads, the results here are just for one thread.

This is more time consuming since a lot of optimizations are enabled and more features from ETL are enabled as well.

Compiler -j1
g++-5.4.0 782s
clang++-3.9 960s
zapcc++ 640s
Speedup VS Clang 1.5
Speedup VS GCC 1.22

On a release build, the speedups are much less interesting. Nevertheless, they are still significant. zapcc is still 1.2 times faster than gcc and 1.5 times faster than clang. Then speedup against clang 3.9 is significantly higher than it was on my experiment with clang 3.7, it's possible that clang 3.9 is slower or simply has new optimization passes.

Conclusion

The previous conclusion still holds with modern version of compilers: zapcc is much faster than other compilers on Debug builds of template heavy code. More than 3 times faster than clang-3.9 and about 2 times faster than gcc-5.4. Since it's based on clang, there should not be any issue compiling projects that already compile with a recent clang. Even though the speedups are less interesting on a release build, it is still significantly, especially compared against clang.

I'm really interested in finding out what will be the pricing for zapcc once out of the beta or if they will be able to get even faster!

For the comparison with gcc 4.9.3 and clang 3.7, you can have a look at this article.

If you want more information about zapcc, you can go to the official website of zapcc

Comments

New design: Faster and mobile compatible
   Posted:


I've finally taken the time to improve the design of the website!

The site was becoming slower and slower, the design was not responsive at all and was an horror on mobile.

I've changed the design to focus more on content and removed superfluous things such as the Google profile or slow things such as the 3D tag cloud. Moreover, the design is now responsive again. It was a matter of removing a lot of bad things I did in the CSS. Instead of having a vertical and an horizontal bars, I now have only one vertical bar with both the navigation and a bit more information. With these changes, the design is now also working on mobile phone! It's about time.

Moreover, I've also spent quite some time working on the speed of the website. For this, I've bundled most of the JS and CSS files together and reduced them. Moreover, the static files are now hosted and cached by CloudFlare. I've also removed the 3D tag cloud which was quite slow. The Google API usage for the Google profile badge were also quite slow. Overall, the index page is now really fast. The article pages are also much faster but it's not perfect, especially because of Disqus that does tons of requests and redirects everywhere. I've also got rid of the Disqus ads which were really insignificant in the end. It may take a while for the ads to disappear according to Disqus.

I know that it's still not perfect, but I hope that user experience on the blog is now improved for all readers and now article can be read on mobile normally. I'll try to continue monitoring the speed and usability of the website to see if I can improve it further in the coming days.

If you have any issue on the updated website, don't hesitate to let me know either by commenting on this post or sending me an email (check the Contact page).

Comments

zapcc - a faster C++ compiler
   Posted:


Update: For a comparison against more modern compiler versions, you can read: zapcc C++ compilation speed against gcc 5.4 and clang 3.9

I just joined the private beta program of zapcc. Zapcc is a c++ compiler, based on Clang which aims at being much faster than other C++ compilers. How they are doing this is using a caching server that saves some of the compiler structures, which should speed up compilation a lot. The private beta is free, but once the compiler is ready, it will be a commercial compiler.

Every C++ developer knows that compilation time can quickly be an issue when programs are getting very big and especially when working with template-heavy code.

To benchmark this new compiler, I use my Expression Template Library (ETL). This is a purely header-only library with lots of templates. There are lots of test cases which is what I'm going to compile. I'm going to compare against Clang-3.7 and gcc-4.9.3.

I have configured zapcc to let is use 2Go RAM per caching server, which is the maximum allowed. Moreover, I killed the servers before each tests.

Debug build

Let's start with a debug build. In that configuration, there is no optimization going on and several of the features of the library (GPU, BLAS, ...) are disabled. This is the fastest way to compile ETL. I gathered this result on a 4 core, 8 threads, Intel processor, with an SSD.

The following table presents the results with different number of threads and the difference of zapcc compared to the other compilers:

Compiler -j1 -j2 -j4 -j6 -j8
g++-4.9.3 350s 185s 104s 94s 91s
clang++-3.7 513s 271s 153s 145s 138s
zapcc++ 158s 87s 47s 44s 42s
Speedup VS Clang 3.24 3.103 3.25 3.29 3.28
Speedup VS GCC 2.21 2.12 2.21 2.13 2.16

The result is pretty clear! zapcc is around three times faster than Clang and around two times faster than GCC. This is pretty impressive!

For those that think than Clang is always faster than GCC, keep in mind that this is not the case for template-heavy code such as this library. In all my tests, Clang has always been slower and much memory hungrier than GCC on template-heavy C++ code. And sometimes the difference is very significant.

Interestingly, we can also see that going past the physical cores is not really interesting on this computer. On some computer, the speedups are interesting, but not on this one. Always benchmark!

Release build

We have seen the results on a debug build, let's now compare on something a bit more timely, a release build with all options of ETL enabled (GPU, BLAS, ...), which should make it significantly longer to compile.

Again, the table:

Compiler -j1 -j2 -j4 -j6 -j8
g++-4.9.3 628s 336s 197s 189s 184s
clang++-3.7 663s 388s 215s 212s 205s
zapcc++ 515s 281s 173s 168s 158s
Speedup VS Clang 1.28 1.38 1.24 1.26 1.29
Speedup VS GCC 1.21 1.30 1.13 1.12 1.16

This time, we can see that the difference is much lower. Zapcc is between 1.2 and 1.4 times faster than Clang and between 1.1 and 1.3 times faster than GCC. This shows that most of the speedups from zapcc are in the front end of the compiler. This is not a lot but still significant over long builds, especially if you have few threads where the absolute difference would be higher.

We can also observe that Clang is now almost on par with GCC which shows that optimization is faster in Clang while front and backend is faster in gcc.

You also have to keep in mind that zapcc memory usage is higher than Clang because of all the caching. Moreover, the server are still up in between compilations, so this memory usage stays between builds, which may not be what you want.

As for runtime, I have not seem any significant difference in performance between the clang version and the zapcc. According to the official benchmarks and documentation, there should not be any difference in that between zapcc and the version of clang on which zapcc is based.

Incremental build

Normally, zapcc should shine at incremental building, but I was unable to show any speedup when changing a single without killing the zapcc servers. Maybe I did something wrong in my usage of zapcc.

Conclusion

In conclusion, we can see that zapcc is always faster than both GCC and Clang, on my template-heavy library. Moreover, on debug builds, it is much faster than any of the two compilers, being more than 2 times faster than GCC and more than 3 times faster than clang. This is really great. Moreover, I have not see any issue with the tool so far, it can seamlessly replace Clang without problem.

It's a bit weird that you cannot allocate more than 2Go to the zapcc servers.

For a program, that's really impressive. I hope that they are continuing the good work and especially that this motivates other compilers to improve the speed of compilation (especially of templates).

If you want more information, you can go to the official website of zapcc

Comments

Blazing fast unit test compilation with doctest 1.1
   Posted:


You may remember my quest for faster compilation times. I had made several changes to the Catch test framework macros in order to save some compilation at the expense of my test code looking a bit less nice:

REQUIRE(a == 9); //Before
REQUIRE_EQUALS(a, 9); //After

The first line is a little bit better, but using several optimizations, I was able to dramatically change the compilation time of the test cases of ETL. In the end, I don't think that the difference between the two lines justifies the high overhead in compilation times.

doctest

doctest is a framework quite similar to Catch but that claims to be much lighter. I tested doctest 1.0 early on, but at this point it was actually slower than Catch and especially slower than my versions of the macro.

Today, doctest 1.1 was released with promises of being even lighter than before and providing several new ways of speeding up compilation. If you want the results directly, you can take a look at the next section.

First of all, this new version improved the basic macros to make expression decomposition faster. When you use the standard REQUIRE macro, the expression is composed by using several template techniques and operator overloading. This is really slow to compile. By removing the need for this decomposition, the fast Catch macros are much faster to compile.

Moreover, doctest 1.1 also introduces CHECK_EQ that does not any expression decomposition. This is close to what I did in my macros expect that it is directly integrated into the framework and preserves all its features. It is also possible to bypass the expression checking code by using FAST_CHECK_EQ macro. In that case, the exceptions are not captured. Finally, a new configuration option is introduced (DOCTEST_CONFIG_SUPER_FAST_ASSERTS) that removes some features related to automatic debugger breaks. Since I don't use the debugger features and I don't need to capture exception everywhere (it's sufficient for me that the test fails completely if an exception is thrown), I'm more than eager to use these new features.

Results

For evaluation, I have compiled the complete test suite of ETL, with 1 thread, using gcc 4.9.3 with various different options, starting from Catch to doctest 1.1 with all compilation time features. Here are the results, in seconds:

Version Time VS Catch VS Fast Catch VS doctest 1.0
Catch 724.22      
Fast Catch 464.52 -36%    
doctest 1.0 871.54 +20% +87%  
doctest 1.1 614.67 -16% +32% -30%
REQUIRE_EQ 493.97 -32% +6% -43%
FAST_REQUIRE_EQ 439.09 -39% -6% -50%
SUPER_FAST_ASSERTS 411.11 -43% -12% -53%

As you can see, doctest 1.1 is much faster to compile than doctest 1.0! This is really great news. Moreover, it is already 16% faster than Catch. When all the features are used, doctest is 12% faster than my stripped down versions of Catch macros (and 43% faster than Catch standard macros). This is really cool! It means that I don't have to do any change in the code (no need to strip macros myself) and I can gain a lot of compilation time compared to the bare Catch framework.

I really think the author of doctest did a great job with the new version. Although this was not of as much interest for me, there are also a lot of other changes in the new version. You can consult the changelog if you want more information.

Conclusion

Overall, doctest 1.1 is much faster to compile than doctest 1.0. Moreover, it offers very fast macros for test assertions that are much faster to compile than Catch versions and even faster than the versions I created myself to reduce compilation time. I really thing this is a great advance for doctest. When compiling with all the optimizations, doctest 1.1 saves me 50 seconds in compilation time compared to the fast version of Catch macro and more than 5 minutes compared to the standard version of Catch macros.

I'll probably start using doctest on my development machine. For now, I'll keep Catch as well since I need it to generate the unit test reports in XML format for Sonarqube. Once this feature appears in doctest, I'll probably drop Catch from ETL and DLL

If you need blazing fast compilation times for your unit tests, doctest 1.1 is probably the way to go.

Comments