Some weeks ago, I finished reading Compilers : Principles, Techniques & Tools, by Afred V. Aho, Monica S. Lam, Ravi Sethi and Jeffrey D. Ullman. This book is also called the Dragon Book due to the cover.
This book is a reference about compiler construction and design. If you are interested in this subject, this book is for you, it's a must-have. However, I have to warn you that this book is very technical and hard. Honestly, some of the chapters are beyond my comprehension. Before this book, I had no real comprehension of the subject. I will certainly read again some of the chapters when I will have more practice into the subject.
If you want, the book is full of exercises about each subject covered in the book. If you plan to do all the exercises, you'll need a lot of time as there are a lot of them and some of them are quite hard. I've done some of them but only a little part.
The first chapter introduces the construction of compilers. You will see the common structure of compilers, the evolution of programming languages and the science behind building a compiler and its applications. The second chapter is still quite general. It will teach you how to develop a simple syntax-directed translator. This chapter is very important as it will give you the basics for understanding the following chapters. You will learn how to define a grammar, what are the main parsing techniques and what is lexical analysis. It will also covers symbol tables and intermediate language generation.
With the third chapter (Lexical Analysis), we are entering the hearth of the matter. You will learn the vocabulary behind lexical analysis (tokens, lexemes, attributes, ...). Then, after you've learned how to define and recognize tokens, you will see the different techniques to build an efficient lexical analyzer. The first technique that will be covered is the use of a lexer generator (Lex). Then you will see in details how to construct a lexer using regular expressions or finite automata especially Nondeterministic Finite Automata and Deterministic Finite Automata.
The next one (Syntax Analysis) is about parsing. After learning how to define and write a grammar you will see how to parse it. You will see in details the most commons types of parsing (Top-Down, Bottom-Up) and the most common parsers (LL(K) and LR(K) parsers). The construction of these kinds of parsers is covered in details and the way to optimize them is also teached. Finally, you will see how to automatically generate a parser using Lex and Yacc. This chapter is sometimes very hard to understand (in my own opinion) but very interesting especially if you plan to build parser without generating it with some advanced tools (for example Yacc or Boost Spirit for C++).
The fourth chapter (Syntax Directed Translation) explains you how to translate some source code (parse it) into a coherent structure (an abstract tree) using a Syntax Directed Scheme. The translation is made based on a syntax using semantic actions and rules to translate the source into something else. You'll see different ways of doing that translations.
Then, the next one (Intermediate Code Generation) teaches you how to generate Intermediate Code from the source. Two different representations are covered : syntax trees and three-address-code. Another subject covered in this chapter is type checking. You'll see in details how to translate expressions, control flow instructions and switch statements into three-address-code statements.
The seventh chapter (Run-Time Environment) gives a lot of information about the different run-time targets that you can compile for. A lot of subjects are covered here: stack and heap allocation, locality exploitation, garbage collectors... This chapter is in my opinion a very good instruction to computer architecture. You cannot imagine develop a compiler without having a deep understanding of the target machine.
The next chapter (Code Generation) is also a very important one. In this chapter, you will see how to generate assembly code from the three-address-code. You will learn how to select the good instructions. A very important subject covered in this chapter is register allocation. You'll learn how to choose wisely the registers to produce efficient code. The basic blocks are also covered there with flow graphs. More than just generating code from Three-Address-Code statements, you'll also see how to optimize them. Only local (to a basic block) optimization techniques will be covered in this chapter. Several techniques that aims at testing if code is optimal are also taught there.
The global optimizations are covered in the next chapter (Machine-Independent Optimizations). You will discover several optimizations that you can do globally (inside a function but among different basic blocks). A data-flow analysis framework is explained here in details. After that, for each of the optimization, the parameters of the data flow analysis are explained. The optimization of loops is treated too.
The three next chapters (Instruction-Level Parallelism, Optimizing for Parallelism and Locality and Interprocedural Analysis) are the most complex of the book. They are covering in details the optimizations that can be made when a compiler supports instruction-level parallelism (executes several instructions in one clock cycle). It also covers interprocedural analysis of a program to allow even better optimization than global optimization inside a function. Honestly, I didn't understand some of the concepts described here. I will read them again one by one, chapter by chapter and try to implement some of the techniques in EDDI in the future.
To conclude, I will say that Compilers : Principles, Techniques & Tools is a very good book that every compiler designer and developer should read before starting constructing a compiler. Although very technical, it's quite clear and contains a huge amount of information. If you plan to develop a compiler, it is a very good idea to read this book first.
I've implement some of the techniques explained in this book in my own compiler. I implemented most of the local optimizations presented and Intermediate Code generation. You can find some information here.