EDDIC 0.9.1 - Enhanced floating point support

I just released the version 0.9.1 of the EDDI Compiler (eddic).

This release is a minor one, there are no huge changes to the language nor in the compiler itself. But that version was necessary before the 1.0 version.

The floating point support of the language have been enhanced with casts. You can now cast float values to int and vice-versa. The syntax is the same as in C:

void main(){
   float a = 1.5;
   float b = a + (float) 100;
   println((int)b);
}

Another improvement is the support for integer suffixes for float:

void main(){
   float a = 100f;
}

Finally, the optimizer has been adapted to support float as well. The optimization techniques are the same as the one for integers.

Last but not least, the compiler can now pass some parameters in registers. In 32 bits platform, the first integer parameter is passed in a register and on 64 bits platform, the first two parameters are passed in registers. In both architectures, the first float parameter is passed in a SSE register.

Future work

The next version will be the 1.0 version. There will be several major changes with this new version.

First, the optimization engine will be almost entirely rewritten. Global optimization will be added to the engine.

There will also be some improvements in the intermediate representation. I will probably a second level of intermediate representation: a low-level Three Address Code representation. This new intermediate representation will be generated by an IntelCompiler to handle stuff common to both 32 and 64bits code generator. This will also includes a pass for global register allocation.

As these changes will not be simple to implement, this version can takes some time before being released.

Download

You can find the compiler sources on the Github repository: https://github.com/wichtounet/eddic

The exact version I refer to is the v0.9.1 available in the GitHub tags or directly as the release branch.

C++11 Concurrency - Part 1 : Start Threads

C++11 introduced a new thread library. This library includes utilities for starting and managing threads. It also contains utilities for synchronization like mutexes and other locks, atomic variables and other utilities. In this serie of posts, I will try to explain most of the features provided by this new library.

To compile the samples of this article, you will need a compiler with C++11 support. In my case, I used GCC 4.6.1 (you need to pass the "-std=c++0x" or "-std=c++11" option to get the C++11 support activated).

Starting threads

Starting a new thread is very easy. When you create an instance of a std::thread, it will automatically be started. When you create a thread you have to give it the code it will execute. The first choice for that, is to pass it a function pointer. Let's start with the very common Hello World:

#include <thread>
#include <iostream>

void hello(){
    std::cout << "Hello from thread " << std::endl;
}

int main(){
    std::thread t1(hello);
    t1.join();

    return 0;
}

All the threads utilities are located in the thread header. An interesting thing in this first example is the call to the join() function. Calling this function forces the current thread to wait for the other one (in this case, the main thread has to wait for the thread t1 to finish). If you omit this call, the result is undefined. The program can print Hello from thread and a new line, can print just Hello from thread without new line or can print nothing. That's because the main thread can return from the main function before the t1 thread finishes its execution.

Distinguishing threads

Each thread has a single id allowing us to distinguish each of them. The std::thread class has a get_id() function returning an unique id for this thread. You can manipulate the current thread with the functions from the std::this_thread namespace. The next example starts with threads and each of them prints its id:

#include <thread>
#include <iostream>
#include <vector>

void hello(){
    std::cout << "Hello from thread " << std::this_thread::get_id() << std::endl;
}

int main(){
    std::vector<std::thread> threads;

    for(int i = 0; i < 5; ++i){
        threads.push_back(std::thread(hello));
    }

    for(auto& thread : threads){
        thread.join();
    }

    return 0;
}

Starting each thread one after one and then storing them into a vector is a common way to handle several threads. With that, you can easily change the number of threads. Even with a very little sample like this one, the results is not predictable. The theoretic case:

Hello from thread 140276650997504
Hello from thread 140276667782912
Hello from thread 140276659390208
Hello from thread 140276642604800
Hello from thread 140276676175616

Is, in my case, also the least common. You can also get results like this one:

Hello from thread Hello from thread Hello from thread 139810974787328Hello from thread 139810983180032Hello from thread
139810966394624
139810991572736
139810958001920

Or a lot of another results. This is because of interleaving. You have no way to control the order of execution of threads. A thread can be preempted at any moment and the appends to the out stream are made one after one (first the append of the string, then append the id and finally the new line), so a thread can print its first part and then be interrupted to print its second part after all the others threads.

Start a thread with a lambda

When the code that has to be executed by each thread is very small, you don't necessary want to create a function for that. In that case, you can use a lambda to define the executed by a thread. We can rewrite the code of the last sample using lambda easily:

#include <thread>
#include <iostream>
#include <vector>

int main(){
    std::vector<std::thread> threads;

    for(int i = 0; i < 5; ++i){
        threads.push_back(std::thread([](){
            std::cout << "Hello from thread " << std::this_thread::get_id() << std::endl;
        }));
    }

    for(auto& thread : threads){
        thread.join();
    }

    return 0;
}

Here we just used a lambda expression instead of the function pointer. Of course, this produces the exact same result as the previous sample.

Next

In the next post of this series, we will see how to protect code using locks.

The source code for each sample is available on Github.

Introduction to 64 Bit Intel Assembly Language Programming for Linux - Book Review

The first book I read about Intel Assembly was lacking information about 64 bits programming. So I ordered and read Introduction to 64 Bit Intel Assembly Language Programming for Linux, by Ray Seyfarth.

This book covers a lot of subjects in assembly. It is adapted to people starting assembly, but it also contains advanced assembly programming techniques. I think that this book is adapted to a lot of people wanting to improve their skills in Intel Assembly. This book covers only 64 bit Intel Assembly in details. It does not cover old memory models, only the memory mode used now.

This book uses yasm to assemble the programs. It uses gdb to debug the assembly programs.

The first chapters are very general. They are covering numbers (octal, decimal and hexadecimal notions), computer memory and memory mapping mode.

The first technical chapter covers Registers in details. It defines all the registers available in Intel Assembly. You will see how to move constants to registers. You will also learn how to move values between memory and registers. Then, the next chapter covers all the mathematical operations (negate, addition, subtraction, division and multiplication). It also covers the use of conditional move instructions. The next one is about bit manipulations (not, and, or and shift). It also covers bit testing and filling.

After that, the chapter eight covers a very important subject: branching and looping. All the jumps are covered in details. You will see how to convert each control structure (if, for, while, do-while) of programming language to assembly. After that, the string instructions are also explained. Once you know how to create control structures, it's time to create your own functions. In that chapter, you will learn the stack and the function call conventions. The stack frames and the recursion are also covered.

The arrays are covered in the next chapter. You will see how to allocate arrays on the stack or on the heap using malloc. The command line parameters are also covered (that was a very interesting part).

Then, floating point math is covered. For that, the Streaming SIMD Extensions (SSE) are used. All the math operations are covered. As is the way to transfer data between XMM registers and memory. The conversion and comparison instructions are also explained here. Some complete samples like dot product of 3D vectors help us understand the SSE instructions.

The system calls are covered in details in chapter twelve. The C system library wrapper functions for system calls are also covered. After that, a whole chapter addresses structures. The allocation of structs is also addressed. Then, the way to use I/O streams from assembly is taught.

A whole chapter is devoted to the implementation of data structures in Intel Assembly. The covered data structures are the linked lists, doubly linked lists, the hash tables and the binary trees. Each common operation on these data structures is implemented.

After that, the last chapters are about optimization and performances. The chapter 16 covers High Performance Assembly Programming in details. In that chapter, you will learn a set of optimization that can be applied to improve the performances of a given code. For example, you will see how to make efficient use of cache or how to make better performing loops. These optimization can also be applied to other programming languages. The following chapters are all covering a single problem and a way to optimize it the most using Intel Assembly. For each of these problems, the C version is compared to the assembly version. Three problems are presented: counting bits in an array of integers, the Sobel filter and computing the correlation of two variables given some sample values.

To conclude, I found this book very book. It covers a lot of subjects in a very good manner. I liked a lot the performance techniques covered in the book. The deep coverage of SSE instructions was also very interesting.

A new theme for the site

I changed the theme on the site. I used the Sleek theme from Studiopress.

The main reason to change to this theme was that I replaced some of my addons. For example, the breadcrumbs are already included in the theme. There is also an included widgets for Twitter.

Another reason is that I find this theme more good looking than the hold one. And finally, this new theme is more up-to-date.

I hope that you will find this theme adapted.

I've still some work to do adapting this theme for the site. If you encounter any kind of problems with the new design, don't hesitate to let a comment or to contact me.

EDDIC 0.9 - Floating point support

I just finished working on the 0.9 version of the EDDIC Compiler.

The language does now support floating point variables. Here is an example of what can be done in EDDI now:

void main(){
   float a = 1.5;
   float b = 3.0;
   float c = a + b;
   println(c);
   c = b + 2.75;
   println(c);

   println(test(2.0888, 1.00222));

   float array[7];
   array[0] = 21.999;
}

For now, there is no interoperability between integers and floating, so you can't add an integer to a floating point or cast a floating point to an integer. Those features will be added in the 0.9.1 version. The floating point support has been implemented using the Streaming SIMD Extension (SSE) of Intel processors. This won't work on processor that doesn't include support for SSE.

Another big improvement is that the position of the tokens in the source file are now collected through the parser. When an error or a warning arises during the compilation, the precise position of the error or the warning is printed to the console.

New options are available for eddic:

  • --ast : Print the Abstract Syntax Tree representation of the source
  • --tac : Print the Three Address Code representation of the source
  • --ast-only : Only print the Abstract Syntax Tree representation of the source (do not continue compilation after printing)
  • --tac-only : Only print the Three Address Code representation of the source (do not continue compilation after printing)

And, finally, some improvements have been made to the sources of the project.

Download

You can find the compiler sources on the Github repository: https://github.com/wichtounet/eddic

The exact version I refer to is the v0.9 available in the github tags or directly as the release branch.

COJAC, A Numerical Problem Sniffer

During my bachelor at the HES-SO University of applied sciences in Fribourg, I worked on a Java project, COJAC.

COJAC is a tool performing On-the-fly code instrumentation to help uncover numerical problems with integer (overflows) and floating point (smearing, cancellation, infinity, NaN) arithmetic.

Yesterday, Dr Dobbs published an article by one of my professor Frédéric Bapst and myself.

This article discusses the question of numerical problems in programming, and focuses on the approach of using on-the-fly code instrumentation to uncover them at runtime. Two realizations are presented: a complete and stable solution for Java applications, and a proof-of-concept Valgrind add-on for Linux executables. Both tools require no intervention on the source code and no recompilation, and should be helpful as a diagnostic tool for developers, as well as for education purposes for undergraduate programmers.

If you are interested by this project, I invite you to read the article on Dr Dobbs : Project of the Month: Cojac, A Numerical Problem Sniffer

You can also test the tool or browse the source code by downloading it on the COJAC website.

EDDIC 0.8.1 : do while loop and better optimization

Only three days after the 0.8 version, I finished the 0.8.1 version.

It's a minor version, so there is no big changes to the language. However, I added support for the do while loop in the source code.

Another change is that assignment is now returning a value. That allows you to make some code like this one:

int a = b = 5;
b = (a = 4) + 1;

This new version includes also some new changes for the optimization engine. I implemented constant propagation and copy propagation for offset assignment. For example:

a(8) = 4;
b = a(8);

becomes:

a(8) = 4;
b = 4;

And the last change is that the concatenations that are detected to be constant after some optimization are made at compile-time by the optimization engine. This simplify a lot the generated code for source file with a lot of concatenations.

The next version (the 0.9) will introduce floating point operations and parameter passing with registers (probably only in 64 bit). It's also possible that I will try to implement a first version of global optimization.

Download

You can find the compiler sources on the Github repository: https://github.com/wichtounet/eddic

The exact version I refer to is the v0.8.1 available in the github tags or directly as the release branch.

Assembly Language Step By Step, Programming with Linux - Book Review

To improve my skills in Intel Assembly, I ordered and read Assembly Language Step by Step, Programming with Linux, by Jeff Duntemann. Just for the record, I read it on my Amazon Kindle.

This book is really made for very beginners. The author uses a lot of metaphor to explain some concepts, comparing assembly to a game he explains in several pages... I didn't liked the writing style of this book. In my opinion, the author uses way too much metaphor and some things takes too many pages to be explained. Another problem of this book is the examples, there are covering tens of pages each. It is a good thing to have complete examples in a book, but having examples of more than 100 lines of code (not counting the comments) in a book is not really convenient (again, only in my opinion).

Another lack of this book is that it covers only 32 bit programming. For a book written in 2009, it is quite limited. And finally, I found it bad to not cover floating point. I think that this is an important subject.

Even if I'm not a fan of this book, most of the content is still interesting and you can learn the basis of assembler with it if you're patient with the writing style, the metaphors and the long examples.

If you are a real beginner in assembly and in programming in general, this book can still be valuable for you.

The first chapters are covering computer programming, processors, arithmetic in different bases and memory locations. Then, the following chapters are covering the tools (assembler, linker and visual tools for editing and debugging). After that, we are jumping in the heart of the subject by learning arithmetic computations, system calls and stack control. The bits instructions are covered in details in a whole chapter. Then, you will be introduced to the writing of functions and how to use string instructions to simplify your programs. The last (and very big) chapter is about using the functions of the C library to performs work like I/O operations, time calculations, print formatted text and generate random numbers. For this last I would have preferred to learn how to do all that operations using only assembly, but it is important to know how to call C functions.

To conclude, I will advice this book only to people who learn assembly as their first programming language. For the others, there is a high risk of be deceived.

Note that, if you want to follow the examples of this book, you'll certainly need the Insight Debugger. You can install this debugger by following the procedure described here.

EDDIC 0.8 : 64bit generation

I just released the version 0.8.

The main change of this version is the addition of a 64bit generation. If you compiles eddic in 64bit, the default output of the compiler will be 64 bit, otherwise it will be 32 bit. You can also select the output by setting command line switch (-32 and -64).

The biggest change at the side of the language is the support of command line arguments. If the main function is declared as main(string[] args), the args passed from the command line will be accessible.

I've also added two operators to the language : size() and length(). These operators allows the programmer to get the size of an array, respectively the length of a string. If the information is present at compile-time, the operator is replaced by the constant otherwise it is equivalent to a single memory access.

I've also made some improvements to the generated x86 code. For example, I'm using string instructions to simplify the generated code and lea and shl to perform fast multiplications. I also changed the syntax used in the generated assembly replacing AT&T syntax by the Intel syntax.

I've added a new optimization technique, copy propagation. This technique keeps track of the assignment of variables to a variables to simplify the generated three-address-code.

Download

You can find the compiler sources on the Github repository : https://github.com/wichtounet/eddic

The exact version I refer to is the v0.8 available in the github tags or directly as the release branch.

Local optimization of Three-Address-Code

Some compilers are using Three-Address-Code (TAC) as an intermediate representation. This representation is very simple to understand and write. Moreover, it's easy to run some optimization on this representation.

Each TAC statement has this general form : result = operand1 operator operand2

For example, here are some TAC statements:

a = 1
x = a * 3
if x > a goto test
param "dddd"
call print
test:
param "asdf"
call print

In this post, we will see some of the local optimizations that can be applied on TAC. A local optimization is an optimization that is applied locally to a basic block. A basic block is a set of TAC statements that has only one entry point and one exit point. Once the first instruction of the basic block is executed, the rest of the instructions are necessarily executed exactly once. These optimizations are easy to design and implement. If you want to run global optimizations (through all the basic blocks of a function) or even Interprocedural Optimization (IPO), you will need a far more complex framework to run optimizations. I will try to write something on global optimization when I will have implemented some of them in EDDI.

The goal of optimization is of course to replace some statements with more efficient statements.

Read more…