I have been working for a while a library that lets me write fairly high-level code from which efficient low-level code can be generated, in order to improve computation speed.
The repository is here:
I have especially numerical applications in mind, such as optimization. The code is generated in such a way that most short-lived objects such as small vectors, complex numbers, dual numbers, etc never have to be allocated on the heap in the first place and just exist as primitive numbers unpacked on the stack. I believe this plays an important role in improving the speed.