Memory Manager in 64bits Intel Assembly on Linux

For the last version of the EDDI Compiler, it has been necessary to extend the dynamic memory allocator, to support free memory. In this post, we will see how to write a simple Memory Manager in Intel Assembly for Linux.

In the past, we've seen how to write a basic memory allocator, this time, we will write a more complete version.

The implementation is made in 64bits Intel Assembly.

Memory Manager specification

The memory will be allocated by blocks. Each block will contain a header with two information:

  • A boolean flag indicating if the block is free or not
  • The size of the block (including the header)

Each time some memory is asked, the blocks are tested one by one until an available one is found. If no available block is found, a new block is allocated after the last one and this block is returned.

The memory manager consists of three functions:

  • memory_init: Init the memory manager
  • memory_alloc: Allocate the given number of bytes of memory
  • memory_free: Release the given block

The parameter is passed in the r14 register. The return value is returned in the rax register.

Global State

This implementation needs two global variables. One for the start address of memory and the other one for the last:

section .data
mem_last dq 0
mem_start dq 0

Init memory Manager

The init function is very simple to implement:

init:
push rbp
mov rbp, rsp
mov rax, 12
xor rdi, rdi
syscall
mov [mem_start], rax
mov [mem_last], rax
leave
ret

We just have to call sys_brk in order to get the location of program break. Then, the start and the last addresses are the same.

Free memory

The free function is the simplest one:

free:
push rbp
mov rbp, rsp
mov qword [r14 - 16], 1
leave
ret

The address to free is passed in the r14 register. We have to go back 16 bytes (size of the control block) to go to the start of the block. The availability flag is set to 1 (the block is free).

The alloc function

The alloc function is the most complex:

alloc:
push rbp
mov rbp, rsp
push rdi
push r10
push r11
push r12
push r13
push r14
add r14, 16
mov r12, [mem_start]
mov r13, [mem_last]
.start:
cmp r12, r13
je .alloc
mov r10, [r12]
mov r11, [r12 + 8]
cmp r10, 1
jne .move
cmp r11, r14
jl .move
mov qword [r12], 0
lea rax, [r12 + 16]
pop r14
pop r13
pop r12
pop r11
pop r10
pop rdi
leave
ret

.move:
add r12, r11
jmp .start

.alloc:
lea rdi, [r12 + r14]
mov rax, 12
syscall
mov [mem_last], rdi
mov qword [r12], 0
mov qword [r12 + 8], r14
lea rax, [r12 + 16]
pop r14
pop r13
pop r12
pop r11
pop r10
pop rdi
leave
ret

As the function is a bit complex, I will detail it in part:

add r14, 16
mov r12, [mem_start]
mov r13, [mem_last]
.start:
cmp r12, r13
je .alloc
mov r10, [r12]
mov r11, [r12 + 8]
cmp r10, 1
jne .move
cmp r11, r14
jl .move
mov qword [r12], 0
lea rax, [r12 + 16]

The necessary number of bytes is passed in the r14 register. We add 16 bytes (size of the control group) to the size as we also need some place for the header. Then, we load the start and last addresses. If both addresses are equal, we need to allocate more memory (detailed later). Then, we check the size and the availability of the current block. If the size is enough to fit the needs and the block is available, we set it to unavailable. We return the address past the control block (16 bytes).

.move:
add r12, r11
jmp .start

To move to the next block, we just have to add the size of the current block to the current block address.

.alloc:
lea rdi, [r12 + r14]
mov rax, 12
syscall
mov [V_mem_last], rdi
mov qword [r12], 0
mov qword [r12 + 8], r14
lea rax, [r12 + 16]

To allocate memory, we compute the new program break and call sys_brk again to set the new program break. The block is then set to not available and the size is set. We return the address past the control block (16 bytes).

The rest of the program is just here to save and restore the registers and compute the stack frames.

Wrap-Up

In this article, we saw how to implement a very simple memory manager in 64bits Intel Assembly on Linux. This memory manager is very simple, but has several drawbacks:

  • The overhead for small blocks is important. For example, allocating an 8 bytes integer needs a 24 bytes block, thrice the size of the int.
  • In the worst-case scenario, all of the process memory need to be walked across to find a new free block
  • The functions are not thread-safe
  • This algorithm can lead to a lot of memory fragmentation

In the future I will try to make a more powerful version of this memory manager.

Download

All the functions are available online on the Github Repository:

They are also available in 32bits Intel Assembly:

EDDI Compiler 1.1.1 – Dynamic Memory Allocation and Constructors/Destructors

As I'm in holiday, the work is going pretty fast. The version 1.1.1 of the EDDI Compiler (eddic) is available.

This version introduces two major changes. The first is the support of dynamic memory allocation. You can allocate a struct or a standard type in help using the new operator. The memory can be released using the delete operator. Another related improved is the addition of constructors and destructors to the language. The following sample shows what can be done with the new features:

struct A {
    int a;

    this(int a){
        this.a = a;

        print("Constructed");
    }

    ~this(){
        println("Destructed");
    }
}

void main(){
    A* b = new A(55);
    delete b;
}

The constructor is called once the memory is allocated. The delete operator calls the destructor and then free the memory.

When a structure is allocated on the stack, the constructor is called at the declaration point and the destructor is called when the variable gets out of scope.

The memory manager is quite simple for now. Memory is allocated in blocks. Each block has a header indicating the size of the block and its availability. The size of the header is 8 bytes in 32 bits and 16 bytes in 64 bits. The free operation can be done in constant time by just setting the availability flag to false. The disadvantage of this technique is that all the blocks needs to be tested to find a free block. This can be slow in some situations. I will try to make a better version in the future.

For that, the memory model has been improved. All the offsets are now increasing and the stack addresses are set at the end of the block.

Another interesting improvement of the language is the support of switch. For now, only switch on int is supported. Here is an example of a switch in EDDI:

switch(a){
    case 3:
        print("3");
    case 4:
        print("4");
    case 5:
        print("5");
    case 6:
        print("6");
    default:
        print("default");
}

The performances of the optimizer have been improved, by doing live-variable analysis less often. Pointers can now be passed in registers. Some of the variables used as temporary copies are removed

The peephole optimizer has been improved to use conditional move when possible. Moreover, the peephole optimizer is now able to perform some local copy propagation.

Future work

The next version of the EDDI Compiler will be the version 1.1.2. This version will add features to read the command-line. Moreover, it will also add support for char type and string comparisons. With that, I think that the language will start to be usable for toy applications.

There will be some improvements to the code that have been left aside for a too long time.

Download

You can find the EDDI Compiler sources on the Github repository: https://github.com/wichtounet/eddic

The exact version I refer to is the v1.1.1 available in the GitHub tags or directly as the release branch.

C++11 Synchronization Benchmark

In the previous parts of this serie, we saw some C++11 Synchronization techniques: locks, lock guards and atomic references.

In this small post, I will present the results of a little benchmark I did run to compare the different techniques. In this benchmark, the critical section is a single increment to an integer. The critical section is protected using three techniques:

  • A single std::mutex with calls to lock() and unlock()
  • A single std::mutex locked with std::lock_guard
  • An atomic reference on the integer

The tests have been made with 1, 2, 4, 8, 16, 32, 64 and 128 threads. Each test is repeated 5 times.

The results are presented in the following figure:

C++11 Synchronization Benchmark Result

As expected, the mutex versions are much slower than the atomic one. An interesting point is that the the atomic version has not a very good scalability. I would have expected that the impact of adding one thread would not be that high.

I'm also surprised that the lock guard version has a non-negligible overhead when there are few threads.

In conclusion, do not locks when all you need is modifying integral types. For that, std::atomic is much faster. Good Lock-Free algorithms are almost always faster than the algorithms with lock.

The sources of the benchmark are available on Github: https://github.com/wichtounet/articles/tree/master/src/threads/benchmark

EDDI Compiler 1.1.0 - Member functions

The version 1.1.0 of the EDDI Compiler (eddic) is available. It took much less time to implement that version than I thought.

The main change to the language is the support of member functions. Each structure can now declare some functions. Functions can be called in each structure object. Here is an example of what can be done with that feature in EDDI:

struct Counter {
    int value;

    void increment(){
        this.value = this.value + 1;
    }

    void add(int number){
        this.value = this.value + number;
    }

    void add(int n1, int n2){
        this.value = this.value + n1;
        this.value = this.value + n2;
    }
}

void main(){
    Counter counter;
    counter.increment();
    println(counter.value);

    counter.add(99);
    println(counter.value);

    counter.add(11, 69);
    println(counter.value);
}

The this pointer is available in each member function. The pointer is passed on the stack just like any other parameter.

Another improvement is the support of the ternary operator:

void main(){
    int a = 5 > 2 ? 44 : 66;
    println(a);
}

The inliner has been improved to support inlining member functions and functions with pointer parameters. The parameter allocation in register is only done starting at O1.

The peephole optimizer has also been improved. Some stacks operations optimization are performed and some unnecessary copies of parameter register are removed.

Finally, the assembly generation has been improved to not use stack frames starting at O2. When this optimization is enabled, the local variables are addressed using stack pointers instead of the base pointer that is not used anymore. This optimization reduces the overhead of function calls.

Future work

The next version of the EDDI Compiler will be the version 1.1.1.

Download

You can find the compiler sources on the Github repository: https://github.com/wichtounet/eddic

The exact version I refer to is the v1.1 available in the GitHub tags or directly as the release branch.

Manage command-line options with Boost Program Options

In command-line program, we often faces problems with the management of command-line options. When you have a few options, it's not a problem. But when writing complex programs with tens of options (or hundreds), it starts to be too complicated to manage by hand. That's where Boost Program Options enters the game!

Boost Program Options is one of the Boost C++ Libraries. It is a very powerful library to handle command-line options. You define all the options of the program and then Boost Program Options takes care of all. It parses the command line, handles errors, gets values and even displays help. It is not a perfect library. But it is very complete that will answer most of the common needs.

This library is not header-only. You will need to build the library and link your program with the library.

Getting started

In this article, I will use po as an abbreviation of boost::program_options and all source will contains this namespace alias:

namespace po = boost::program_options;

First of all, it is necessary to create an instance of po::option_description:

po::options_description description("MyTool Usage");

description.add_options()
    ("help", "Display this help message")
    ("version", "Display the version number");

The parameter to the constructor is the title of the options. To add options, you use the add_options() function and append all your options, each one surrounded with parenthesis. Here, we declared two options (help and version). They can be set with --help and --version.

Then, you can parse the command line by declaring a po::variables_map to store the variables and parse the command line into that storage:

po::variables_map vm;
po::store(po::command_line_parser(argc, argv).options(description).run(), vm);
po::notify(vm);

Then, you can easily verify if an option has been set by using the count(const std::string& name) function on the vm object:

if(vm.count("help")){
    std::cout << description;
}

You can use operator on the description to output all the options of program.

Short options

By default, all the options are accessed by using -- in front of them. You can also specify a short version for each option:

description.add_options()
    ("help,h", "Display this help message")
    ("version,v", "Display the version number");

With that, you can display with either --help or -h. Even if you select help with -h, you can still verify if help has been set with count("help").

Option with value

Boost Program Options can also handle option that need a specific value. Lets add a compression option:

description.add_options()
    ("help,h", "Display this help message")
    ("compression,c", po::value<int>(), "Compression level")
    ("version,v", "Display the version number");

You can then get the value of option easily:

if(vm.count("compression")){
    std::cout << "Compression level " << vm["compression"].as<int>() << std::endl;
}

For the story, the option value are stored as boost::any. You can get the value of the option by using operator[] on the po::variables_map. You can get the value type with the as function with the type you need.

On the command line, the value can be set with --compression 10, -c 10 or -c10.

You can also configure a default value for an option:

description.add_options()
    ("help,h", "Display this help message")
    ("compression,c", po::value<int>()->default_value(5), "Compression level")
    ("version,v", "Display the version number");

With that, if the option is not set on the command line, the option has the specified value. With that, the option is always defined.

Finally, you can also set an implicit value. This is the value of the option is the option is set to the command line without a value (--compression or -c):

description.add_options()
    ("help,h", "Display this help message")
    ("compression,c", po::value<int>()->default_value(5)->implicit_value(10), "Compression level")
    ("version,v", "Display the version number");

Positional options

It is often convenient to have a list of files on the command line. These options does not have a name. In Boost Program Options, these options are called positional options. You have to declare them in the describe as any other action. For example, for a list of files:

description.add_options()
    ("help,h", "Display this help message")
    ("compression,c", po::value<int>()->default_value(5)->implicit_value(10),"Compression level")
    ("input-files", po::value<std::vector<std::string>>(), "Input files")
    ("version,v", "Display the version number");

Then, you have to declare it as positional and then finally specify it when parsing the command-line:

po::positional_options_description p;
p.add("input-files", -1);
po::variables_map vm;
po::store(po::command_line_parser(argc, argv).options(description).positional(p).run(), vm);
po::notify(vm);

The positional options values are retrieved the exact same way as other options:

if(vm.count("input-files")){
    std::vector<std::string> files = vm["input-files"].as<std::vector<std::string>>();
    for(std::string file : files){
        std::cout << "Input file " << file << std::endl;
    }
}

Wrap-Up

In this article, we saw the most important aspects of Boost Program Options. With these notions, you can start using this great library. If you need more information about the library, you can read the official documentation that is very well made.

You can download the final sources of this article on Github: v1.cpp

C++11 Concurrency Tutorial - Part 4: Atomic Types

In the previous article, we saw advanced techniques about mutexes. In this post, we will continue to work on mutexes with more advanced techniques. We will also study another concurrency technique of the C++11 Concurrency Library: Atomic Types

Atomic Types

We will take, the example of a Counter:

struct Counter {
    int value;
    
    void increment(){
        ++value;
    }
    
    void decrement(){
        --value;
    }
    
    int get(){
        return value
    }
};

We already saw that this class was not safe at all to use in multithreaded environment. We also saw how to make if safe using mutexes. This time, we will see how to make it safe using atomic types. The main advantage of this technique is its performance. Indeed, in most cases, the std::atomic operations are implemented with lock-free operations that are much faster than locks.

The C++11 Concurrency Library introduces Atomic Types as a template class: std::atomic. You can use any Type you want with that template and the operations on that variable will be atomic and so thread-safe. It has to be taken into account that it is up to the library implementation to choose which syncronization mechanism is used to make the operations on that type atomic. On standard platforms for integral types like int, long, float, ... it will be some lock-free technique. If you want to make a big type (let's saw 2MB storage), you can use std::atomic as well, but mutexes will be used. In this case, there will be no performance advantage.

The main functions that std::atomic provide are the store and load functions that atomically set and get the contents of the std::atomic. Another interesting function is exchange, that sets the atomic to a new value and returns the value held previously. Finally, there are also two functions compare_exchange_weak and compare_exchance_strong that performs atomic exchange only if the value is equal to the provided expected value. These two last functions can be used to implement lock-free algorithms.

std::atomic is specialized for all integral types to provide member functions specific to integral (like operators ++, --, fetch_add, fetch_sub, ...).

It is fairly easy to make the counter safe with std::atomic:

#include <atomic>

struct AtomicCounter {
    std::atomic<int> value;

    void increment(){
        ++value;
    }

    void decrement(){
        --value;
    }

    int get(){
        return value.load();
    }
};

If you test this counter, you will see that the value is always the expected one.

Wrap-Up

In this article we saw a very elegant technique to perform atomic operations on any type. I advice you to use std::atomic any time you need to make atomic operations on a type, especially integral types.

The source code for this article can be found on Github.

Next

In the next post of this series, we will see how to use the Futures facility to perform asynchronous task.

EDDI Compiler 1.0.3 - Inlining and register allocation

The version 1.0.3 of the EDDI Compiler (eddic) is available.

The only improvement to the language is that the size of a global array can now be defined using a constant global variable.

The main improvement of this version is the addition of inlining in the optimization engine. This optimization replace a call to a function by the body of the function. For now, the inlining optimizer is quite basic. For now, it doesn't inline only a specific call site but all the call sites of a given function. Moreover, the heuristics used for inlining are quite simple (only the size of the function is taken into account). Only functions that takes int and float parameters can be inlined. This optimization will be improved in the future.

The second main change is the arrival of a basic register allocation. In each function, one or more variables can be assigned to registers. Only the most used variables are allocated into registers. Another optimization is that variables that are not used after all optimization techniques have been applied are removed from the function storage. The unused functions are also removed from the program after the optimization passes.

Moreover, the performances of optimization engine have been improved by about 20%.

The MTAC representation has been improved. The ARRAY operators have been removed because they can be replaced with the DOT operators. The preamble and prologue generations for LTAC has also been refactored. When it is possible, the stack frames are not generated.

Finally, the configuration of the compiler has been improved with several new optimization option and the options being separated into several option groups.

Future work

The next version of the EDDI Compiler will be the version 1.1.0.

The main change will be member functions inside of structures. For now, there will be no kind of virtual functions and inheritance but that will certainly come in its time.

And as ever, I will be more than pleased to hear any idea you could have about this project :)

Download

You can find the compiler sources on the Github repository: https://github.com/wichtounet/eddic

The exact version I refer to is the v1.0.3 available in the GitHub tags or directly as the release branch.

taskwarrior-php 0.1 : A simple PHP Frontend for Taskwarrior

I released the version 0.1 of taskwarrior-php.

This project is a simple PHP Frontend for Taskwarrior. For now, the frontend is quite basic. All the tasks are displayed and sorted by projects. The completion of each project is also computed.

taskwarrior-php Screenshot

You can also insert a new task. For now, only the project and the description of the task can be modified.

Download

The application is available on the Git repository : https://github.com/wichtounet/taskwarrior-php

The installation is simple, you just have to put all the files in a folder of a PHP server. Then, you have to edit the config.php to set the location of your Taskwarrior files.

It is necessary that the Taskwarrior files are on the PHP server as well. For that, you can use the FTP pull and push commands of Taskwarrior.

Don't hesitate to contact me if you have some ideas for this project or if you find some bugs.

EDDI Compiler 1.0.2 – Better pointer support and Dead-Code Elimination

The version 1.0.2 of the EDDI Compiler (eddic) is available.

The language itself does not features something new, but the support of pointers has been greatly improved. You can now declare arrays of pointers and return pointers from functions. Structures can hold pointers as well. Moreover, arrays of structures are now supported. These new features have increased the number of operators of the MTAC Level.

The more important part of this new version resides in the Optimization Engine. A new optimization technique has been implemented: Dead-Code Elimination. This technique removes all code that calculates values for variables that are not used anymore after this statement. Another change is that empty functions are removed after optimization (as well as every call to the removed functions). The liveness analyzer has been replaced by a global Live-Variable Analysis routine. This information is used in the optimization engine and in the LTAC Compiler. Finally, the Peephole Optimizer has been improved to support some local optimization techniques like constant propagation and basic dead-code elimination.

The code generators have also been improved by outputting only native functions that are called. It means that if the program does not print a float, the _F5printF function will not be generated. Moreover, the native functions have been moved in external assembly files.

Future work

The next version of the EDDI Compiler will be the version 1.0.3.

This version will see a first basic version of Inlining optimization and certainly register-allocation of the most used variables. There will certainly be no change of the language itself.

A cleanup of the two compilers (MTAC and LTAC) will be performed as well as simplification of the MTAC Language if possible.

The other changes will mainly be minor changes to the compiler.

And as ever, I will be more than pleased to hear any idea you could have about this project :)

Download

You can find the compiler sources on the Github repository: https://github.com/wichtounet/eddic

The exact version I refer to is the v1.0.2 available in the GitHub tags or directly as the release branch.

EDDI Compiler 1.0.1 - Pointers and better struct support

The version 1.0.1 of the EDDI Compiler (eddic) is now available.

The language itself has been updated to support pointers. For now, this support is quite basic, but it allows to pass any type of the language by pointer to a function. No arithmetic is permitted on pointers, only dereferencing is allowed. The following sample shows how pointers are used in the language:

struct A {
    int a;
    string b;
    float c;
}

void main(){
    A a;

    a.a = 44;
    a.b = "44";
    a.c = 44f;

    test(a); 
    print(a.a);
    print("|");
}

void test(A* a){
    A* b = a;

    a.a = 55;

    print(a.a);
    print("|");

    b.a = 66;
    b.b = "66";
    b.c = 66f;

    print(a.a);
    print("|");
    print(b.a);
    print("|");
}

This sample is not very useful, but it shows the usage of pointers well enough.

Another improvement to the language is that it supports now nested struct. It means that a member of a struct can be a struct itself.

That's it for the language improvements. On the side of the compiler itself, I've improved the error reporting for structs. For example, the compiler display a clear error when a struct is recursively nested. The peephole optimizer has been improved a bit with new optimization, but it still rather simple. The optimization engine is now able to optimize functions in parallel. The improvement is not quite large, but that can be useful if there are a lot of functions. I've also improved a lot the Abstract Syntax Tree representation of assignments to unify variable, array and struct assignments in one Node with the notion of LeftValue.

Finally, the tests have also been improved. New tests have been added and helped me find new bugs. Moreover, the tests are now made on each optimization levels for each test case. There was some issue with smallest optimization level.

Future work

The next version of the EDDI Compiler will be the version 1.0.2.

This version will adds support for returning a pointer from a function. Moreover, it will also adds support for pointers inside of struct. The last change to the language will be that you will be able to declare array of pointers and array of structure.

The peephole optimizer will perform more powerful optimization. At least, I will add an optimization to remove assignments to registers that are not used and use less registers. That will perhaps imply to add support for basic blocks in the LTAC Language.

I will also add a powerful dead-code optimization to the Optimization Engine. This will replace the RemoveAssign and RemoveMultipleAssign pass of the engine, being more powerful.

As ever, I'm open to any idea you could have about this project :)

Download

You can find the compiler sources on the Github repository: https://github.com/wichtounet/eddic

The exact version I refer to is the v1.0.1 available in the GitHub tags or directly as the release branch.