It's been on my TODO list for some time to deep dive into DTL and find where I can to remove excess work.
There's a LOT of defensive coding in there to ensure things are what we expect, and to handle cases "just in case", and I've a gut feeling [nothing more] that we can remove some of these safety catches.
However, to do this I would first require a credible template benchmark suite.
It's been my experience that it's actually quite rare for the slowdowns to be unavoidably DTLs fault... often a tiny more work in the view to put data into better structures can affect drastic speedups.
So, in closing... if you're interested in helping me help you... let's talk.
--
Curtis