I'm proud to announce a new release of EDDIC, the version 0.7.
Most of the changes are internal to the compiler. I read a new book : Compilers: Principles, Techniques, and Tools and applied some of the advices of the author. The biggest change is the use of a new intermediate representation : Three-Address-Code statements. This representation is easy, all the statements are basically of the form a = b + c with + being any operator of the language. The big advantage of this representation is that we can easily run optimization on it. Another advantage is that this representation is complete enough to represent most of the programming languages, so, we can imagine compiling several different source languages into the TAC language and then compiling them the same way.
Once the Three-Address-Code representation is generated and separated into basic blocks, it is compiled into X86 assembly using a code generator. I've made a lot of improvements on the generated assembly. For example, I'm using several new instructions (neg, inc, dec, xor, ...) to generate more efficient code. Moreover, I'm doing a better use of registers with keeping variables into registers as long as possible.
Better optimization engine
Here is the list of what the optimizer do at the TAC level:
- Simplify arithmetic identities : a = b + 0 => a = b
- Reduce in strength : a = b * 2 => a = b + b
- Constant folding : a = 2 + 2 => a = 4
- Constant propagation : when a constant is assigned to a, reuse the constant as long as no other assignment is made to a
- Remove overridden assign
- Remove dead basic blocks : when a condition is known at compile time we can know the path it will take and remove basic blocks that will never be taken
- Remove needless jumps : After the other optimization have been done it is possible that a goto is directly targeting the next basic block so we can remove it
- Merge basic blocks : After some statements have been replaced or removed it is possible that we can merge some basic blocks together
The optimizer is running the different optimization technique as long as one of them as an effect on the code. At the present time, the optimization techniques are used locally (within a basic block) so the generated assembly is not perfect, but for what I tested so far, it's a good start.
Moreover, the language itself also had some improvements:
- The minus and plus unary operators have been added to the grammar
- The local and global variables can be const
- A source file can now includes another source or a file from the standard library (very little for now, but a little start)
The project itself has also been improved
- Most of the classes and files are now documented using Doxygen
- The unit tests are now testing compilation and execution of some samples, that helped me finding some bugs in the code base and in the new changes
You can find the compiler sources on the Github repository : https://github.com/wichtounet/eddic.
The exact version I refer to is the v0.7 available in the github tags or directly as the release branch.