Thanks for the positive feedback :)
As far as I know, we don't really describe the optimizations done nor the motivation for doing so.
I know you were asking more about the "what?" than the "why?", but I've been meaning to write something up on the motivation so this thread is now my excuse to at least put something down in text:
Users often assume we do optimizations to improve the QoR (ie. power, performance, and area) of the design. This is not really the motivation. Synthesis tools do a much better job than we could dream to and it's not really worth duplicating that effort. The reasons we do optimizations are:
1. To make our Verilog friendlier to the tools
2. To make the Verilog friendlier to read
3. To make compilation run faster
4. To enable certain design patterns
(1) is the primary goal, and it is _related_ to QoR because our optimizations do help ensure that the Verilog matches certain patterns expected by the tools.
(2) This one is counterintuitive so bear with me for a moment. While it is often true that optimizations remove stuff that users would like to see and this can be confusing and frustrating, in most cases, the optimizations make the Verilog _more clear_, and you really want them on. This is hard to illustrate with simple examples, but in complex generators there is *a lot* of garbage that is extremely distracting if it isn't cleaned up. I'm not going to pretend that we've found the right balance of when to optimize vs. when to not (and we need to do better), but that's the basic idea.
(3) is the least important of the goals but it is nice when things run faster.
(4) is somewhat subtle and needs further explanation than I can do off the top of my head. Basically, when writing a generator with lots of parameters, sometimes it's simpler to express logic unconditionally and rely on optimizations to remove it if it's not needed. For example, If you have a generator using TileLink, the version of the protocol you're implementing may or may not implement all of the channels. While at some point you need to implement or not implement a channel, there's often lots of glue logic where it's more convenient to not worry about that and just connect things through as if the entire protocol is being implemented. This then relies on optimizations to remove the unused channels or it will be extremely confusing (with lots of lint warnings) in the Verilog. I think this is another area where we could do better, perhaps making these design patterns more explicit, but they are pretty common in rocket-chip and related projects.
I hope this explanation is useful to someone.