For the last version of the EDDI Compiler, it has been necessary to extend the dynamic memory allocator, to support free memory. In this post, we will see how to write a simple Memory Manager in Intel Assembly for Linux.
In the past, we've seen how to write a basic memory allocator, this time, we will write a more complete version.
The implementation is made in 64bits Intel Assembly.
Memory Manager specification
The memory will be allocated by blocks. Each block will contain a header with two information:
- A boolean flag indicating if the block is free or not
- The size of the block (including the header)
Each time some memory is asked, the blocks are tested one by one until an available one is found. If no available block is found, a new block is allocated after the last one and this block is returned.
The memory manager consists of three functions:
- memory_init: Init the memory manager
- memory_alloc: Allocate the given number of bytes of memory
- memory_free: Release the given block
The parameter is passed in the r14 register. The return value is returned in the rax register.
This implementation needs two global variables. One for the start address of memory and the other one for the last:
section .data mem_last dq 0 mem_start dq 0
Init memory Manager
The init function is very simple to implement:
init: push rbp mov rbp, rsp mov rax, 12 xor rdi, rdi syscall mov [mem_start], rax mov [mem_last], rax leave ret
We just have to call sys_brk in order to get the location of program break. Then, the start and the last addresses are the same.
The free function is the simplest one:
free: push rbp mov rbp, rsp mov qword [r14 - 16], 1 leave ret
The address to free is passed in the r14 register. We have to go back 16 bytes (size of the control block) to go to the start of the block. The availability flag is set to 1 (the block is free).
The alloc function
The alloc function is the most complex:
alloc: push rbp mov rbp, rsp push rdi push r10 push r11 push r12 push r13 push r14 add r14, 16 mov r12, [mem_start] mov r13, [mem_last] .start: cmp r12, r13 je .alloc mov r10, [r12] mov r11, [r12 + 8] cmp r10, 1 jne .move cmp r11, r14 jl .move mov qword [r12], 0 lea rax, [r12 + 16] pop r14 pop r13 pop r12 pop r11 pop r10 pop rdi leave ret .move: add r12, r11 jmp .start .alloc: lea rdi, [r12 + r14] mov rax, 12 syscall mov [mem_last], rdi mov qword [r12], 0 mov qword [r12 + 8], r14 lea rax, [r12 + 16] pop r14 pop r13 pop r12 pop r11 pop r10 pop rdi leave ret
As the function is a bit complex, I will detail it in part:
add r14, 16 mov r12, [mem_start] mov r13, [mem_last] .start: cmp r12, r13 je .alloc mov r10, [r12] mov r11, [r12 + 8] cmp r10, 1 jne .move cmp r11, r14 jl .move mov qword [r12], 0 lea rax, [r12 + 16]
The necessary number of bytes is passed in the r14 register. We add 16 bytes (size of the control group) to the size as we also need some place for the header. Then, we load the start and last addresses. If both addresses are equal, we need to allocate more memory (detailed later). Then, we check the size and the availability of the current block. If the size is enough to fit the needs and the block is available, we set it to unavailable. We return the address past the control block (16 bytes).
.move: add r12, r11 jmp .start
To move to the next block, we just have to add the size of the current block to the current block address.
.alloc: lea rdi, [r12 + r14] mov rax, 12 syscall mov [V_mem_last], rdi mov qword [r12], 0 mov qword [r12 + 8], r14 lea rax, [r12 + 16]
To allocate memory, we compute the new program break and call sys_brk again to set the new program break. The block is then set to not available and the size is set. We return the address past the control block (16 bytes).
The rest of the program is just here to save and restore the registers and compute the stack frames.
In this article, we saw how to implement a very simple memory manager in 64bits Intel Assembly on Linux. This memory manager is very simple, but has several drawbacks:
- The overhead for small blocks is important. For example, allocating an 8 bytes integer needs a 24 bytes block, thrice the size of the int.
- In the worst-case scenario, all of the process memory need to be walked across to find a new free block
- The functions are not thread-safe
- This algorithm can lead to a lot of memory fragmentation
In the future I will try to make a more powerful version of this memory manager.
All the functions are available online on the Github Repository:
They are also available in 32bits Intel Assembly: