Compiler Architecture refinements for eddic

Baptiste Wicht

The next version of eddic will see an improved compiler architecture. There are two new main changes in this version:

A better separation between the front end and the back end
A new intermediate representation to improve and ease code generation

Front end and Back End

First, the front and back ends have been clearly separated. The general compiler architecture is currently something like that:

The first part didn't change, but the Compiler was part was clearly separated between front and back ends:

The backend has no information about the source language. It only sees the intermediate representation provided by the front-end, named: Medium-Level Three Address Code (MTAC).

There are several advantages to this model. The main one is that it is easy to add support for a new programming language to the compiler. Only the front end needs to be changed. The same can be achieved if a new output is necessary, for example output ARM assembly instead of Intel assembly.

New intermediate representation

In the previous versions of the compiler, the code generators were fairly complex. Indeed, they had to transform the MTAC intermediate representation directly into assembly. This process involves several things:

instruction selection
register allocation
low-level optimization (replace a mov rax, 0 with xor rax, rax for example)
handle basic blocks management

In this version, I decided to change it to a better architecture. This architecture uses a new intermediate representation: Low-Level Three Address Code (LTAC). As its name states, it is a low-level representation, close to assembly. In this representation there are addresses, registers and abstracted instructions. This representation is platform independent (the differences between 32 and 64 bits are moved to the code generators). There are no more basic blocks here, only functions containing statements.

The next figure presents the structure of the backend:

The compiler is responsible for transforming the MTAC Representation in LTAC Representation. It does not do any low-level optimization. The instruction selection is easier as it is platform independent. The peephole optimizer is responsible for the low-levels optimizations. In the 1.0 release, there would be only few things done at this level. In the future, I will try to invest some time to complete it to generate better assembly code. The optimizations are far simpler than the one done in the MTAC optimization engine. Indeed, a peephole optimizer is generally working only in a small window of instructions, like three or four instructions at a time. And finally, the code generators performs the instruction selection process and address resolving. It also has to translate symbolic registers into physical ones.

Conclusion

I hope that these refinements in the compiler architecture will allow the compiler to produce better code.

The 1.0 version of the compiler will include another new features:

Basic support for custom structures
Global optimizations
Some bug fixes found with the new set of unit tests

As always, feel free to comment on the new architecture, the compiler itself, the project or whatever

EDDIC 0.9.1 - Enhanced floating point support

EDDI Compiler 0.6.1 : Function return types

eddic 0.5.1 : Better assembly generation and faster parsing

EDDIC 0.7 : New compilation model and optimizations

EDDI Compiler 1.1.4 – Graph Coloring Register Allocation

EDDI Compiler 1.0.2 – Better pointer support and Dead-Code Elimination

Front end and Back End

New intermediate representation

Conclusion

Related articles

Comments