GVN is super interesting, yes. Well worth the time.
I would add an SSA-construction algorithm (SROA in LLVM, but you can go with something simpler). Understanding SSA construction requires other concepts, such as dominators, and is an eye-opener for some of the tradeoffs being done in IR design (caching reaching-defs information).
I would also try a simple static analysis, like range analysis, so students understand the concepts of abstract interpretation, abstraction, widening, etc. You can introduce SSI here if you have time.
Alias analysis is important. But don’t show LLVM’s one 😅 I think gcc’s is probably a better example. This combined with some simple optimizations that it enables, like store forwarding.
Linear-scan register allocation is simple and I think it’s important to see the last mile. As well as a simple instructions election algorithm. The Burg family ones are not complicated.
In the end, more than learning many algorithms, I think the point is that they should realize how large programs are and how quick the compiler must be to handle those. So it’s all about doing the right tradeoffs of compilation speed vs benefit in performance/code-size/etc. Spend some time looking at the output of the compiler, both IR & assembly, and discuss in class what could be improved. It’s fascinating.
If your students learn all of this, please send them in for internships 😊
Nuno
From: Wei Wu
Sent: 17 February 2021 10:22
To: llvm-dev <llvm...@lists.llvm.org>
Subject: [llvm-dev] [Education] Call for suggestion: Which algorithms/passes you want a new LLVM developer to know
Teaching dominators (and relatedly, the terminology related to natural loops) is critical for compiler developers, but I don’t think the construction algorithms themselves are particularly worth going through, except maybe as part of the compile-time tradeoff Nuno mentions.
So far, I haven’t seen any mention of any loop algorithms, but I’m not sure who is best to include here.
As concepts, covering scalar evolution, dependence analysis, and MemorySSA could be useful, giving a students a sense of what the tools in the toolbox they have are. Further, covering what InstCombine does at a high level (rather than walking through all of its bajillion patterns) can be important.
Echoing Nuno, I worry that a focus on the algorithms may be too narrow for the goal of bringing up new developers. You’ll especially want to cover the compile time versus runtime performance tradeoff, and how passes are designed to try to alleviate compile time performance (e.g., a very strong preference on maintaining dominator tree/loop info because those are so commonly used that recalculating them constantly would measurably increase compile time). Another topic that would be useful is cost models: LLVM IR is not a perfect analogue for machine hardware, and some transformations that seem beneficial in IR land may be harmful in actual machine code. You could try collecting some examples where an optimization helps on one architecture but hurts on another.