Hi Luke,
Adding to Stan's information, Numba is designed with the limitation of LLVM in mind. Knowing that LLVM is tuned for C/C++, we designed Numba to emit LLVM IR in a similar fashion to what a C frontend would do. The bulk of the high-level optimization is done before reaching LLVM IR. For instance, array expressions are lowered as explicit loop nests so that it looks like:
for (int i=0; i<N; ++i) {
for (int j=0l j<M; ++j) {
...
}
}
That gives LLVM a familiar workload to perform its loop optimizations.
Type inference is used to statically resolve much of dynamism early on. Even though that means the Python we support is very restrictive, actual numeric use-cases rarely require runtime polymorphism. Also, after Numba's type inference, each Python function is fully specialized to its call-site. You can imagine that we treat every Python function as if it is a C++ templated function, and type inference supplies the template parameter for each invocation. For the remaining unsupported cases, our excuse is that: users are expected to strategically apply Numba JIT to compute intensive code 😉.