Principles of Compiler Design, by Alfred Aho and Jeffrey Ullman, is a classic textbook on compilers for computer programming languages. Both of the authors won the 2020 Turing award for their work on compilers.
It is often called the "green dragon book"[1] and its cover depicts a knight and a dragon in battle; the dragon is green, and labeled "Complexity of Compiler Design", while the knight wields a lance and a shield labeled "LALR parser generator" and "Syntax Directed Translation" respectively, and rides a horse labeled "Data Flow Analysis". The book may be called the "green dragon book" to distinguish it from its successor, Aho, Sethi & Ullman's Compilers: Principles, Techniques, and Tools, which is the "red dragon book".[1] The second edition of Compilers: Principles, Techniques, and Tools added a fourth author, Monica S. Lam, and the dragon became purple; hence becoming the "purple dragon book". The book also contains the entire code for making a compiler. The back cover offers the original inspiration of the cover design: The dragon is replaced by windmills, and the knight is Don Quixote.
The book was published by Addison-Wesley, ISBN 0-201-00022-9. The acknowledgments mention that the book was entirely typeset at Bell Labs using troff on the Unix operating system, little of which had, at that time, been seen outside the Laboratories.
But most of the time, I have difficulty understanding the kinds of stuff in the book. Though the content is fine, and I end up asking questions on cs.StackExchange or StackOverflow. Then I thought of reading the reviews of the text and find out whether it is only me who is facing is the problem or there are people like me as well, finding the language hard to understand. Then I came across the review page of this book in good reads and among many reviews, I found one which explains the problem with the text probably. Here it is:
The good thing about this book is it is comprehensive and covers a lot of ground from different vantage points. For example, the parsing chapters also cover the design of lex and yacc apart from basic topics. The type checking and runtime environment chapters had examples from C and ML to cover different scenarios.
Now the bad parts. The book is awfully written and I cannot understand how can a book in its 2nd (or 3rd?) edition continue to be so bad. It was shoddy writing throughout but the prize for the worst writing is these topics:
Another consistent problem is their horrible use of passive voice. This interacts really badly with their discussion of mathematical aspects. While the use of passive voice might be excusable, its shoddy use is not.
Reading this book often felt like deciphering some ancient text. I needed to translate a lot of obtuse writing into something that can be understood in my head. This translation happens for all readings involuntarily but it was alarming I had to consciously do it for this book.
I am not sure why is this book called a "classic" computer science book. It is more like a kind of book that students are forced to study by their professors. This book is impossible to read unless you are following some other source. I followed Prof. Alex Aiken's lectures
This book might be responsible for the unpopularity of the study and research in programming languages. I agree compilers are not easy but this book has taken a difficult topic and made it incomprehensible thereby doing a disservice to this whole field of study.
I am not criticizing the text or authors. The only thing is that probably the text does not suit a self-learner like myself. Could anyone suggest a compiler text to substitute the red dragon book possibly? A book with easy language, such that one easily understand what the authors tried to mean just by reading a sentence once. This would help a self-learner to focus more on the concept rather than getting lost in finding out the meaning behind the lines.
The "problem" with the dragon book is that it is so complete; intentionally so. Over its lifetime there have been tremendous advances in the theory and practice of building compilers. If you want "state of the art", then you want the dragon book, probably a later (harder) edition, even.
But it is a bit much, as you have seen, for a first course, or an overview of compiling. It is often used in graduate courses, for example, and, even then, in a follow-up course to one in Language Principles which discusses much of the underlying theory separately.
While it is hard to find and a bit expensive, one of the best books for an introduction, and separately a bit of real literature in CS, get a copy of Brinch Hansen's On Pascal Compilers. It shows how to build a compiler as well as a machine to execute the code using Recursive Descent Parsing, which is easy to understand and suitable for LL languages (with a few extensions for non-LL features).
Most commercial compilers use LR compiling techniques even if the language is LL, simply for efficiency reasons, but the theory is much harder as are the compilers. For state-of-the-art understanding, you need to delve into LR, but for a first introduction, LL is enough and easy to understand.
LL compiling is also called Predictive Parsing, since it is easy to look at a language statement and choose a method for translating it, usually using keywords. The suitable languages for LL parsing are also easy for a human to read and understand. If a language (feature) absolutely demands LR parsing, then it will be difficult for a person to read.
But I did had the same complains from the students, then one of them (Hussain Hassan Mehanna, he's probably a prof in UK now) told me that his friend in AUC uses a book called "Compiler Construction, Principles & Practice" by Kenneth C. LoudenI brought it, the faculty agreed, the students liked it; ..... it was fine
I would like to know which one would be more suitable for a novice student having knowledge of only automata theory. Which book has a more lucid language? Which one is more preferable for a student performing self-study. I wanted to ask this question because in this answer it is said that older editions of few texts are sometimes better. So it is always better to have a good guidance before diving into a subject learning.
Here the question asks whether "the two volume books are more advanced or complete than the dragon book.Is the dragon book supposed to replace the two-volume books? Are the two-volume books outdated or still very relevant?" The book which is being referred to as "two volume book" is different from the book marked as (1) in my question.
Now I am asking specifically which one among the two books which are referred by me are more easy to read? Which has a more lucid language.In this answer here they say that the automata book by Ullman in the newer editions have changed the elegance of the actual text. Many important portions have been removed and at portions it lacks clarity. Now my question is which among the two books are better or are they equivalent? I had difficulty in following the second edition of the automata book so I felt like asking it here, such that before I begin, I am guided properly.
To learn the basics an old book does not matter, because foundations should remain the same, but I want to read those foundations in an elegant manner, without being puzzled by unclear or in-depth explanation of the same.(which at times happen if a new co-author is introduced)
Although it is not quite what you are asking, I deprecated these two books when teaching compiler design as many otherwise capable students are finding them tough going. I started to focus on more practical based books. Its a matter of top-down versus bottom-up approaches to the material (not the parsing).
Review by Andrew Schulman
Copyright (C) Dr. Dobb's Journal, October, 1992 Is there any programmer who hasn't read the "Dragon Book,"the classic text on compiler design by Aho, Sethi, and Ullman? Or triedto read it? Or at least bought it, and placed it in a prominent positionon their bookshelf? Certainly, you've at least seen the book: the cover(by Jean Depoian) shows a dragon, representing "Complexity of CompilerDesign," about to be slain by a knight with "Data Flow Analysis"armor, a "Syntax Directed Translation" shield, and a "LALRParser Generator" sword.
You haven't tried to read the Dragon Book? Well, you should. Like everybook out of AT&T Bell Labs, it is beautifully written and organized.It is one of those few books that can make you pleased to be involved insoftware, because while reading the book you come to realize that our field--themessiness of daily practice aside--really does have a solid mathematicalfoundation. But in my more honest moments, I have to admit that, try asI might, I've never really understood much in the Dragon Book past aboutpage 200 (at least that's where the underlining stops in my copy). The problem,for me at least, is that it's one thing to read about a subject such asthe equivalence of languages and machines (a strikingly beautiful discovery)and another thing to really see this equivalence.
For me, and I suspect for a lot of programmers, the only way to illustratea topic like this is with some code. You say that regular expressions andfinite-state machines are equivalent? I can appreciate that this is an importantresult, but to understand it I need to see some C code that converts a regularexpression into a two-dimensional array that forms the program for one ofthese machines.
This is where Allen Holub's book, Compiler Design in C, comes in.If you have ever wanted to understand how your favorite compiler works,or if you have ever needed to write some form of language processor (perhapsas simple as a text-pattern searcher, or the macro or script language fora product), you will want Holub's book. It is approachable by programmersin a way that the Dragon Book just isn't.
Several good, readily understandable books on compiler design have beenavailable for years. Hendrix's A Small C Compiler (M&T Publishing,1990), which comes with complete C source code for an 8088-based C compiler,is the best example. You walk away from Hendrix's book seeing exactly howhis Small C compiler is put together and feeling that you could "doit yourself."
But real compilers, such as Borland C++ or Microsoft C, aren't built usingthe readily understandable, recursive-descent technique that Hendrix putsto such good use. These compilers are built, in an initially very nonintuitiveway, using compiler-compiler tools such as LEX and YACC. LEX takes a setof regular expressions and associated C code, and turns them into the functionyylex(), which tokenizes input (such as your .C file). The C code is essentially"event driven:" It is invoked whenever its associated patternis recognized in the input. YACC takes a grammar and associated C code,and turns them into the function yyparse(), which parses the tokens generatedfor example by yylex(). The main() routine for a C compiler might consistof little more than a call to yyparse().
What Holub does in this massive book is amazing. He presents the completeC source code for a LEX clone, two different YACC clones (one of which,Llama, builds a top-down LL(1) parser, while the other, occs--the "othercompiler-compiler system" -- builds a bottom-up, LALR(1) parser likethat built by (YACC), and a C compiler. The C compiler's source code consistslargely, not of .C files, but of a LEX file (c.lex) and a YACC file (c.y).The .C files do symbol-table management, manipulation of lvalues and rvalues,code generation, arithmetic operations, and the like.
DDJ readers may know Holub as this magazine's former C columnist. He bringsto Compiler Design in C the same attention to detail found in hisearlier The C Companion and On Command (a detailed presentationof the C source code for a command interpreter). Above all, Holub excelsat showing how things work. This has always been my favorite kind of reading,escape reading almost. Such "how it works" writing is quite differentfrom "how to" writing. The pleasure is not so much a "Hey,I could do that!" feeling (I know I couldn't), but rather that of seeinga little bit of how the software I use every day actually does its stuff.What's Borland C++ doing when it crunches through my .C files? Having workedthrough Holub's book, I have a much better idea.
The source code itself is presented in an almost ideal way, with each .Cfile broken up by just the right amount of text. The order in which codeis presented is crucial in a book like this, and Holub has made good useof Arachne, a C preprocessor he wrote. Arachne is a version of Knuth's WEBsystem (which I discussed in the August 1992 DDJ), allowing source codeand documentation to be put together in a single input file. Arachne isitself a compiler and, as Holub notes, it stands as "an example ofhow you can apply the techniques presented in this book to applicationsother than writing compilers for standard programming languages." Compilerdesign is a very general-purpose programming skill.
The chief advantage of Holub's book over the other books on this subjectis clearly its skillful presentation of a large amount of C code. But Holubdoes not just fling gobs of source code at you, as so many programming booksdo. Many authors discuss the implementation of their programs in lovingdetail, but forget to describe what the program does, or what output itproduces. Revealing the workings of a machine, without disclosing what themachine does, is a classic symptom of engineer's disease. Holub's book doesn'tsuffer from this at all. He walks the reader through each stage of the compilerprocess, always remembering to point out what the final goal is, what theprogram's output will look like, and so on. The C code generated by LEX,Llama, and occs is nicely commented and essentially self-documenting. Heconstantly shows the input to the program, its output, and exactly how theprogram turned the former into the latter.
Often, Holub shows the same thing from multiple angles. For example, hepresents a regular expression, a hand-written recognizer for that expression,a diagram of the equivalent finite-state machine (FSM) to recognize thatexpression, a two-dimensional state table representing the machine, andfinally the C version of the machine, both compressed and uncompressed,with the driver function that uses the tables. Holub walks carefully throughthe whole process, showing how a LEX tokenizer can take text such as "1 + 2 * 3" and turn it into a stream of tokens (input for a Yacc grammar)such as NUM PLUS NUM STAR NUM.
One of the points that becomes clear as you read this book is the tremendousgenerality of the finite-state machine as a way of programming. In essence,an FSM consists of a two-dimensional table "next," with the rowsholding states and the columns holding input. A generic "driver"steps through the table; see Example 1.
The goal of programs like LEX and YACC is to produce the table "next."The driver, which is the while (state = next(state, input)) loop, is genericand changes little between programs. In other words, the "next"table really is a program for a virtual machine.
Since LEX and YACC are themselves just compilers (they compile regular expressionsor grammars with associated C code into state tables), it's interestingto see how they themselves are written. Holub builds LEX by hand, usingrecursive descent. Once you have LEX, you can write LEX.LEX, which wouldbe a table-driven lexical analyzer. Holub leaves this as one of many excellentexercises in the book. (I'm waiting for The Compiler Design in C AnswerBook!) With the Llama parser, Holub takes a classic bootstrap approach:First, he uses a LEX file (with plenty of C actions) to implement a scaled-downversion of Llama; then he feeds the scaled-down version of Llama with aLlama input file that creates a Llama parser, "like a snake eatingits tail."
Even in a book close to a thousand pages long, many topics can't be coveredin such complete detail. Many efficiency considerations are avoided by thecode, though the text contains good pointers to the relevant literature.The chapter on optimization is a good introduction to this subject, thoughthe C compiler itself is nonoptimizing. Many programmers are interestedin compiler optimizations (this, after all, is what compiler benchmarksmeasure, for want of anything better), but popular topics such as peepholeoptimization and common-subexpression elimination probably can't be fullyunderstood without first digesting the material in Holub's book.
The code generated by the C compiler is, oddly enough, not 80x86 or 680x0assembler, much less binary object code, but something called C-Code. Thisis an assembly language-like form of C. (For example, there are no if ,while,or for statements; everything is reduced to its ultimate goto form.) Whileit would perhaps have been more satisfying to have the compiler generateactual assembler, C-Code is just as good for this book's purpose.
A disk is not included with the book, but is available separately from theauthor for $60. I had the usual problems of getting the subdirectories andmake files straight, but these seem unavoidable when working with largequantities of someone else's source code, even source code as well commentedand well organized as this. In any case, ready-to-run DOS.EXEs are provided.The highlight of this package is a fascinating full-screen "visible"C compiler that shows the compiler's operations. I do wish some screen shotsof this program had been included with the book, perhaps instead of theover 50 pages on curses. The full-screen C compiler -- actually it's occswhose operations can be made full-screen, and the C compiler just inheritsthis like any other occs program -- is built using curses, for which Holubprovides a really fairly irrelevant (in this context) description.
Even if you have the commercially supported MKS LEX/YACC, or Gnu FLEX/BISON,you will want to get Holub's book to see how these programs--or at leastvery similar programs--actually work. The Gnu software comes with sourcecode, but I wouldn't want to tackle it without first having read Holub'sbook. I had a strange experience after finishing this book. In preparationfor this review, I took my copy of the Dragon Book down from the shelf yetagain, just to see if it really is such hard going. Oddly enough, I foundthat I actually understand a lot of it past page 200. Clearly this is becauseI've worked through Holub's book. If you, too, have been frustrated tryingto read the Dragon Book, first read Compiler Design in C (whose cover,incidentally, shows four mice operating some sort of cheese-grater/mailboxcontraption). Then try to slay the dragon.
Example 1: Sample FSM. STATE next[NUM_STATES][NUM_INPUTS]; while (state = next(state, input)) if (state == ACCEPT) brake; else do_action(state);
Compiler Design in C
Allen J. Holub
Prentice-Hall, 1990
924 pages
ISBN 0-13-155045-4
Dr. Dobb's Electronic Review of Computer Books
Created 5/1/96 / Last modified 6/15/96
Back to top