Dynamic memory allocation in Intel Assembly on Linux

For the version 0.6.0 of the EDDI Compiler, I have written a simple dynamic memory allocation function in assembly. I did that to avoid using malloc in my assembly code. As this is not an easy subject, this article will explain the main parts of writing this function.

As the EDDI Compiler creates program for Linux platform, this article will focus on writing a little memory allocator for Linux in Intel Assembly.

In this article I will follow the AT&T notation.

Specifications

The function works like malloc but is simpler. The specifications are the following ones:

  • We call the function with one argument: the dynamic memory size we need
  • The function returns the start address of the allocated memory in the %eax register
  • There is no need to deallocate the allocated memory
  • The size that we ask will generally small and always less than 16384 octets
  • Having some gaps in the memory is not a problem for now

So as you can see there are several limitations to this memory allocator. These limitations are the one I had for EDDI, so I'll follow them in this article.

Dynamic memory allocation

In Linux, there are two ways for performing dynamic memory allocation:

  • brk: Increment the size of the data segment after the end of the program. This memory is directly after the program and is always contiguous. It's the easiest way for allocating memory. This technique is not perfect for large blocks of data.
  • mmap: Creates a new memory mapping in the virtual address space. The kernel gives you memory in virtually every place of the memory.

In our case, as we need only small blocks, we will use brk to dynamically allocate memory.

We can call these procedures using system calls. In assembly, you can use system calls with interruptions (0x80).

Implementation

We need two variables for this function. One to keep track of the remaining size and another one to keep track of the current address of the allocated memory.

.data
.size VIeddi_remaining, 4

VIeddi_remaining:
.long 0
.size VIeddi_current, 4

VIeddi_current:
.long 0

Both variables are initialized to 0.

And here is the function I've developed :

eddi_alloc:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %ecx
movl VIeddi_remaining, %ebx
cmpl %ebx, %ecx
jle alloc_normal
movl $45, %eax
xorl %ebx, %ebx
int  $0x80
movl %eax, %esi
movl %eax, %ebx
addl $16384, %ebx
movl $45, %eax
int  $0x80
movl %esi, %eax
movl $16384, VIeddi_remaining
movl %esi, VIeddi_current

alloc_normal:
movl VIeddi_current, %eax
movl VIeddi_current, %ebx
addl %ecx, %ebx
movl %ebx, VIeddi_current
movl VIeddi_remaining, %ebx
subl %ecx, %ebx
movl %ebx, VIeddi_remaining
leave
ret

I will describe now each part of the alloc function.

movl 8(%ebp), %ecx
movl VIeddi_remaining, %ebx
cmpl %ebx, %ecx
jle alloc_normal

In this part we test if there is enough remaining size for the dynamic memory allocation request. It's equivalent to if(remaining >= size). If there is enough size, we jump to the normal allocation part :

alloc_normal:
movl VIeddi_current, %eax
movl VIeddi_current, %ebx
addl %ecx, %ebx
movl %ebx, VIeddi_current
movl VIeddi_remaining, %ebx
subl %ecx, %ebx
movl %ebx, VIeddi_remaining

First, we move the current address of memory into the %eax register for the return value. Then we add the size of the new allocated block to the current address. Finally we remove the size of the new allocated block from the remaining size. After that, we can leave the function.

The most interesting part is what we do when we have to allocate more memory :

movl $45, %eax
xorl %ebx, %ebx
int  $0x80
movl %eax, %esi
movl %eax, %ebx
addl $16384, %ebx
movl $45, %eax
int  $0x80
movl $16384, VIeddi_remaining
movl %esi, VIeddi_current

We start by doing an interruption to execute a system call. The 45 in the %eax register indicates a sys_brk call. The 0 in the %ebx register, indicates that we want the current position of brk space. We save this current position into the %esi register. Then we add 16384 bits (4K octets) to this address. We call again the sys_brk routine to set the address of the brk space to the calculated address. This is the way to dynamically allocates 4K of memory. Finally, we add 4K to the remaining size in octets and we put the current address (before the add) as the current address.

Possible improvements

We should make some optimization if this function has to be invoked frequently. The first interruption (call to sys_brk) has only to be done once. The very first time we need to get the start address. Then, we can use the current address as the base address when we do the new allocation.

Another improvement is to avoid having gaps between the used blocks. For that, we can avoid setting the current address directly to the newly allocated address but just add 4K to the remaining size. The blocks will overlap 2 allocated blocks.

We could also check that the value returned by the sys_brk is valid. On error, the procedure can return -1.

Conclusion

In this post, we developed a basic dynamic memory allocation function in Intel assembly on the Linux platform. I hope that this information can helps some of you.

Don't hesitate if you have a question or a comment on my implementation.

EDDI Compiler 0.6.1 : Function return types

I just released a new version of the EDDI Compiler : eddic 0.6.1

This is a minor update with only one change on the language: the functions can now return something. The return values are returned in registers (%eax for int and %eax:%ebx for strings). At this time, it is not possible to return an array from a function.

The other changes of this version are on the code base side. All the headers have been cleaned. There are less useless imports and the code is cleaner. Moreover, I also have rewritten the Readme in order to include more useful information in it. The parsing phase should be a little faster now and the assembly has been improved by using the POP instruction. The phases of checking have also been reordered. In the future, it's possible that several phases will be merged together for performance purposes, but for now, it's quite fast on my sample even the 2MB stress file.

You can now generate the Doxygen generation using make doc. The target is generated with CMake. At this time, the documentation is almost empty, but I will work on it on the next releases.

The next version will be a major release. I don't know what will be the changes for the language itself, but the compiler will use a new intermediate representation. It will use Three-Address-Code representation. This representation is simple and can be easily optimized. This will make easier the transformation from the AST to the assembly. At this time, the code of the intermediate compiler is very hard to maintain and contains a lot of logic. The switch to a new, simpler, intermediate representation will simplify the intermediate compiler.

You can find the EDDI compiler sources on the Github repository : https://github.com/wichtounet/eddic.

The exact version I refer to is the v0.6.1 available in the github tags or directly as the release branch.

Linux Mint 12 (Lisa) released!

Linux Mint 12 has just been released!

This new version includes a Gnome 3 using a specific extension MGSE (Mint Gnome Sheel Extensions) and a Gnome 2 UI (MATE). You will find also a lot of other improvements.

From several months, Linux Mint is the most used Linux distribution.

For more informations, you can read the official announcement

How to print strings and integers in Intel Assembly on Linux ?

In this post, we'll learn how to print strings and integers to the console on Linux using Intel Assembly. In this post, I'll use the AT&T notation, because it's the notation used in EDDI.

In EDDI, I have to print strings and numbers to the console, as this is not an easy exercise, I wanted to share my experience here.

On Linux, the only way to print something on the console is to use a system call. For that, we have to use the 0x80 interrupt code.

Declare strings

First, we'll see how to declare strings in an Intel Assembly file. You can use the .string instruction to achieve that :

StringToPrint:
.string "Hello"

Print strings

Then, to print, we will call the sys_write system call :

movl $4, %eax
movl $1, %ebx
movl $StringToPrint, %ecx
movl $5, %edx
int $0x80

The value in %eax (4) indicates the system call we need (sys_write). The 1 in %ebx indicates that we want to write in the console. Finally the two last parameters indicates the string to print and the size of the string. In Intel assembly, the int instruction launch an interrupt and the 0x80 in the interrupt table is set to the system call in the Linux Kernel.

As you can see, this code does use 4 registers and does not save any of them. Ideally, you will save the registers before and restore them. It depends on when you use this routine.

Print integers

Writing an integer is a bit more complicated. If you have the integer in the string, there is no problem, but if you have only a long on your assembly, you'll have to convert the int into a string to print it. We will convert the integer char after char and use the stack as storage for our string. Then every char will be printed to the console using the same system as before.

So let's say we have our number in the %eax register :

movl $9234, %eax

So let's take a look at the code :

xorl %esi, %esi

loop:
movl $0, %edx
movl $10, %ebx
divl %ebx
addl $48, %edx
pushl %edx
incl %esi
cmpl $0, %eax
jz   next
jmp loop

next:
cmpl $0, %esi
jz   exit
decl %esi
movl $4, %eax
movl %esp, %ecx
movl $1, %ebx
movl $1, %edx
int  $0x80
addl $4, %esp
jmp  next

exit:

The first part of the code consists in dividing the value by 10 until we reach zero. The remainder of the division is pushed onto the stack. For example, for our number, after this part, we'll have 4-3-2-9 on the stack. The order is reversed, which is logic because we stack the remainders from the right. During this phase, we count the number of elements using the %esi register.

Once this is done, we print each characters one by one starting with the last that has been pushed. Here we decrement the counter for each char and we use the sys_write call with %esp as the address of the string of one character. After each character, we incremetn the %esp to cancel the push that we used.

We have to do this in two phases in order to get the characters in the good order and not in reverse order.

Handle negative numbers

As you may have noticed, we do not manage negative numbers in our code. They will be printed, but it will be positive number. Indeed, in Intel Assembly (and in processors in general), negative numbers are handled with two's complement. Handling negative numbers in our code is not a big deal. We can add this code at the beginning :

cmpl $0, %eax
jge loop
neg %eax
pushl %eax
; Print "-"
popl %eax

First of all, we check if the number is smaller than 0, if it's not the case, we directly jump to the code we used before. If it's smaller, we negate the number and print a - before printing the real number. We have to save the %eax register before printing the - character because %eax is used for printing.

You'll now have a complete procedure to print an integer on the console in assembly.

I hope that this could be of some help for somebody.

EDDI Compiler 0.6.0 : Arrays

This new version of the EDDI Compiler was faster to finalize than the previous.

The main feature of this release is the use of arrays. You can now use arrays in the EDDI code. You can declare global or local arrays and pass them as parameters to the function. In the assembly, there is certainly still some optimization to perform, but the code works well for now. A new loop is also available the foreach loop for arrays. The loop iterates through each element of the array.

I also performed some new improvements in the compiler speed.

The local and global variables now have default values (0 for int and the empty string for strings). So you can omit the value in the declaration.

I removed the dependency to the C library with replacing malloc with a little memory allocation function I wrote myself. This function is very simple and only use sys_brk to allocate memory directly after the program. You cannot release the memory for now, but that was the same before.

The compiler is now able to warn you if you declare something and you don't use it (parameter, variable, function). Moreover, if you enable the optimization, the functions and variables that are not used will not be compiled.

I made a little fix to the print_integer function to handle negative numbers. Before that, they were printed but in two's complement, now they are printed in the negative form (-344 for example).

I don't know exactly yet what will be included in the next release of the EDDI Compiler. Probably this will be minor release with the inclusion of function return types, but I'm not sure yet. In the future, I also want to add some kind of structure to the program.

You can find the compiler sources on the Github repository : https://github.com/wichtounet/eddic.

The exact version I refer to is the v0.6 available in the github tags or directly as the release branch.

New Wordpress theme for the site

After a long time with the old F2 theme, I decided to choose a more modern design. I chose the WP Plus design which is based on the Google Plus Design.

I think that this design is more practical and cleaner for the users and more comfortable.

Moreover, I also took some time to change some of my addons. I replaced All In One SEO with Wordpress SEO plugin from Joost de Valk. I think that this plugin is better than the first one, but more of that, it also replaced one addon generating the XML sitemap and helped me adding a breadcrumb to the site.

I hope that these changes will improve the user experience on my website.

If you encounter any kind of problems with the new design, don't hesitate to let a comment or to contact me.

Boost intrusive_ptr : faster shared pointer

This post will present the Boost intrusive_ptr and its usage in C++ programming.

Recently, I took some time to optimize the parsing performances of the EDDI Compiler. The parsing phase creates a lot of nodes to fill the Abstract Syntax Tree.

One of the way I found was to replace some shared_ptr by some intrusive_ptr of the Boost library.

It's a faster alternative of shared_ptr. Like its name indicates, it's intrusive. The reference counter is included directely in the managed class, in the contrary of the shared_ptr where the reference counter has to be dynamically allocated to live aside the object. This leads to some performances improvement. Considering memory, the footprint of an intrusive_ptr is the same as the footprint of a raw pointer. This is not the case for the shared_ptr that have a pointer to the object, a pointer to the counter and the counter itself.

For example, if you have a class X:

class X {
    std::string name;
    int age;
};

And you use it in your code using a shared_ptr :

void test(){
    std::shared_ptr<X> x(new X);

    std::cout << x->name << std::endl;
}

and you want to use an intrusive_ptr, you have first to add a reference counter inside the X class :

class X {
    std::string name;
    int age;

    long references;
    X() : references(0) {}
};

And you have to indicate to the intrusive_ptr where the reference counter can be found for this class :

inline void intrusive_ptr_add_ref(X* x){
    ++x->references;
}

inline void intrusive_ptr_release(X* x){
    if(--x->references == 0) 
        delete x;
}

And finally you can use the intrusive_ptr to replace your shared_ptr :

void test(){
    boost::intrusive_ptr<X> x(new X);

    std::cout << x->name << std::endl;
}

The smart pointer itself can be used exactly the same way as a shared_ptr. If you have several classes that are managed using an intrusive_ptr, you can use a function template to tell Boost that all the reference counter are at the same place :

template<typename T>
inline void intrusive_ptr_add_ref(T* expr){
    ++expr->references;
}

template<typename T>
inline void intrusive_ptr_release(T* expr){
    if(--expr->references == 0)
        delete expr;
}

As you can see, the pointer is very intrusive and needs some boilerplate code added to your application, but it can leads to some interesting improvements for classes very often dynamically allocated.

There is another advantage in using intrusive_ptr. As the reference counter is stored into the object itself, you can create several intrusive_ptr to the same object without any problem. This is not the case when you use a shared_ptr. Indeed, if you create two shared_ptr to the same dynamically allocated object, they will both have a different references counter and at the end, you will end up with an object being deleted twice.

Of course, there are not only advantages. First of all, you have to declare a field in every classes that you want to manage using an intrusive_ptr and you have to declare functions to manage the reference. Then there are some disadvantages when using this pointer type compared to a shared_ptr :

  • It's impossible to create a weak_ptr from a intrusive_ptr
  • Code redundancy, you have to copy the reference counter in every class that you want to use an intrusive_ptr with
  • You have to provide a function for every types that has to be used with intrusive_ptr (only two functions if you use the template versions of the two functions)

To conclude, the boost::intrusive_ptr can be a good replacement of std::shared_ptr in a performance critical application, but if you have no performances problem, do not use it because it makes your code less clear. If you are concerned by performances when using std::shared_ptr, consider also using std::make_shared to create your pointers, so that the reference counter and the object itself will be allocated at the same place and at the same time, resulting in better performances. Another case where it's interesting to use an intrusive_ptr is when dealing with libraries using a lot of raw pointers, because you can create several intrusive_ptr to the same raw pointer without any problem.

For more information, you can consult the official documentation.

eddic 0.5.1 : Better assembly generation and faster parsing

I'm very pleased to release the version 0.5.1 of the EDDI Compiler.

It makes now a long time since the last version of eddic, but I started again working frequently on it. This version doesn't add any new feature to the language, but there are a lot of improvements in the compiler itself.

First of all, the generated assembly has been improved a lot. I use now a intermediate representation of assembly and then the compiler is able to make more optimizations in its choice. This optimization is especially visible for integer computations. Before this, all the computations used stack operations and then we use almost only registers when it's possible. It's still not perfect, but it uses way less instructions. Moreover, this can enable me to write a 64 assembly code instead of 32 and provide both versions in the compiler.

Another improvement is the speed of the parser. I now use Boost Spirit to parse the source file and construct an Abstract Syntax Tree. This parsing is very fast now (with some optimizations). Moreover, it will be easier to add new constructs later.

I also improved the general performances at some places. I also use Boost Program Options to parse the command line options.

In the next version (0.6.0), I will introduce arrays of int and strings and the foreach construct for array. I will also remove the dependency to malloc writing a memory allocation manager in assembly. I will also introduce warnings in the compiler.

You can find the compiler sources on the Github repository : https://github.com/wichtounet/eddic.

The compiler now needs Boost 1.47.0 to build.

The exact version I refer to is the v0.5.1 available in the github tags or directly as the release branch.

Effective STL Book Review

I finished reading the last Effective C++ volume : Effective STL, Scott Meyers.

Again, this book follows the same model as the two others, providing several rules aiming at making you a better (more effective) STL programmer. All the items of this book are centered on the usage of the Standard Template Library (STL).

The first chapters (Containers) gives you several advices for using STL Containers. The main point of this chapter is about how to choose carefully a container for a specific use case. It also contains several advices on how to use them well regarding to performances and thread safety.

In the next chapter (vector and string), the author focuses on two specific containers : vector and string. There are some specificities on these two containers that can improve the efficiency of the program.

The third chapter (Associative Containers) is about the associative containers of the STL : set, map, multimap and multiset. It also contains some important informations about which methods to use for each usage in order to preserve the efficiency of each container. It will also explain you the very important difference between equality and equivalence.

The next one (Iterators) explains which iterators to use and how to convert iterators of one type to another.

The chapter five (Algorithms) describes the most important algorithms of the STL. There are tons of algorithms in the STL and using them can save you from writing a lot of boilerplate code. For example, you will learn the different sorting algorithms and for_each. I found that this chapter was especially interesting.

The sixth chapter (Functors) explains you how to create functors and predicates for using with STL classes. It also treats of the special functions to convert functions to functors.

The last chapter (Programming with the STL) gives you advices on how to use the STL to make easier the programming of concrete programs. It will teach you what functions to use, what headers to include and how to decipher STL-related errors.

I found that this book was really good, as good as the first one. The advices are very useful and easy to understand. Using this advices enabled me to improve my code and the efficiency of it. I will recommend this book to every C++ developers using the STL.