Glauco asked "If you happen to recall the specific video and approximate timestamp where he discusses the model you’ve implemented, I’d greatly appreciate it."
I can't seem to fine the specific video.
I based my code on gpt.py (which I refactored into multiply py files). This file is 225 lines.
I ignored the file bigram.py.
This repo uses a Shakespeare corpus to generate Shakespeare like dialog.
I kept the model code in gpt.py and just replaced the Shakespeare corpus with a Metamath corpus that I generated from Python.