If you are really interested in the low level details and can read or are willing to learn to read an HDL, I'd say take a look at the risc-v project. You can either look at the rocket (in-order) or boom (out-of-order) cores. My recommendation would be to start with rocket since it is simpler. Further boom is built with a lot of components picked out of rocket. Though risc-v does not have CAS directly and uses load-reserved/store-conditional instructions. lr/sc is a stronger guarantee than CAS. It checks bus access and not data bits (which is what CAS checks) at the h/w level so the implementation is different, but a lot of the same concerns still apply.
Here are some links that might help -
Follow the links to see related things like the AMOALU, arbiter etc.
For folks interested in hardware, you'll find a lot other interesting things that are glanced over in most Computer Architecture/Digital design courses when writing the toy MIPS processors. Implementations of things like TLB, BTB, DMA, decode units, FPU etc are all written from scratch. Sadly documentation is sparse and it is easy to get lost, but you can always ask questions on the risc-v mailing list. I'd like to stress that even though it might seem like it, rocket is not simply a throwaway toy implementation. Last I checked the rocket-core implementation was twice as energy efficient as the Cortex-A5 (take this with a grain of salt).
The rocket and boom sources (and the standalone FPU unit) are all written in Chisel - a DSL on top of Scala. If you understand Verilog, it will not take you too long to understand Chisel. Chisel is written at a much higher level than Verilog and is easier to read IMO. If Chisel seems like garbage to you and you want to read Verilog, Chisel translates to very clean and predictable Verilog using the Chisel toolchain.