Memory Manager in 64bits Intel Assembly on Linux

For the last version of the EDDI Compiler, it has been necessary to extend the dynamic memory allocator, to support free memory. In this post, we will see how to write a simple Memory Manager in Intel Assembly for Linux.

In the past, we've seen how to write a basic memory allocator, this time, we will write a more complete version.

The implementation is made in 64bits Intel Assembly.

Memory Manager specification

The memory will be allocated by blocks. Each block will contain a header with two information:

  • A boolean flag indicating if the block is free or not
  • The size of the block (including the header)

Each time some memory is asked, the blocks are tested one by one until an available one is found. If no available block is found, a new block is allocated after the last one and this block is returned.

The memory manager consists of three functions:

  • memory_init: Init the memory manager
  • memory_alloc: Allocate the given number of bytes of memory
  • memory_free: Release the given block

The parameter is passed in the r14 register. The return value is returned in the rax register.

Global State

This implementation needs two global variables. One for the start address of memory and the other one for the last:

section .data
mem_last dq 0
mem_start dq 0

Init memory Manager

The init function is very simple to implement:

init:
push rbp
mov rbp, rsp
mov rax, 12
xor rdi, rdi
syscall
mov [mem_start], rax
mov [mem_last], rax
leave
ret

We just have to call sys_brk in order to get the location of program break. Then, the start and the last addresses are the same.

Free memory

The free function is the simplest one:

free:
push rbp
mov rbp, rsp
mov qword [r14 - 16], 1
leave
ret

The address to free is passed in the r14 register. We have to go back 16 bytes (size of the control block) to go to the start of the block. The availability flag is set to 1 (the block is free).

The alloc function

The alloc function is the most complex:

alloc:
push rbp
mov rbp, rsp
push rdi
push r10
push r11
push r12
push r13
push r14
add r14, 16
mov r12, [mem_start]
mov r13, [mem_last]
.start:
cmp r12, r13
je .alloc
mov r10, [r12]
mov r11, [r12 + 8]
cmp r10, 1
jne .move
cmp r11, r14
jl .move
mov qword [r12], 0
lea rax, [r12 + 16]
pop r14
pop r13
pop r12
pop r11
pop r10
pop rdi
leave
ret

.move:
add r12, r11
jmp .start

.alloc:
lea rdi, [r12 + r14]
mov rax, 12
syscall
mov [mem_last], rdi
mov qword [r12], 0
mov qword [r12 + 8], r14
lea rax, [r12 + 16]
pop r14
pop r13
pop r12
pop r11
pop r10
pop rdi
leave
ret

As the function is a bit complex, I will detail it in part:

add r14, 16
mov r12, [mem_start]
mov r13, [mem_last]
.start:
cmp r12, r13
je .alloc
mov r10, [r12]
mov r11, [r12 + 8]
cmp r10, 1
jne .move
cmp r11, r14
jl .move
mov qword [r12], 0
lea rax, [r12 + 16]

The necessary number of bytes is passed in the r14 register. We add 16 bytes (size of the control group) to the size as we also need some place for the header. Then, we load the start and last addresses. If both addresses are equal, we need to allocate more memory (detailed later). Then, we check the size and the availability of the current block. If the size is enough to fit the needs and the block is available, we set it to unavailable. We return the address past the control block (16 bytes).

.move:
add r12, r11
jmp .start

To move to the next block, we just have to add the size of the current block to the current block address.

.alloc:
lea rdi, [r12 + r14]
mov rax, 12
syscall
mov [V_mem_last], rdi
mov qword [r12], 0
mov qword [r12 + 8], r14
lea rax, [r12 + 16]

To allocate memory, we compute the new program break and call sys_brk again to set the new program break. The block is then set to not available and the size is set. We return the address past the control block (16 bytes).

The rest of the program is just here to save and restore the registers and compute the stack frames.

Wrap-Up

In this article, we saw how to implement a very simple memory manager in 64bits Intel Assembly on Linux. This memory manager is very simple, but has several drawbacks:

  • The overhead for small blocks is important. For example, allocating an 8 bytes integer needs a 24 bytes block, thrice the size of the int.
  • In the worst-case scenario, all of the process memory need to be walked across to find a new free block
  • The functions are not thread-safe
  • This algorithm can lead to a lot of memory fragmentation

In the future I will try to make a more powerful version of this memory manager.

Download

All the functions are available online on the Github Repository:

They are also available in 32bits Intel Assembly:

Comments

Comments powered by Disqus