Up until now, XLR was using LLVM only to manipulate XL data structures, i.e. instances of the class Tree or subclasses. This meant that when you wrote 1+2, some runtime function would get a pointer to two Integer objects with values 1 and 2, create a new Integer(3) object, and return that. Obviously, this had a significant runtime cost, in particular requiring a garbage collector to reclaim all the dynamically allocated objects generated even for the simplest computation.
The recent refactoring work was intended to help close the gap. The primary objective was to identify types in a program precisely enough that it allowed LLVM to do its job effectively. This refactoring work is now mostly done. Here are some encouraging numbers:
• To compute the 35th element of the Fibonacci suite, the old implementation took approximately 8s of CPU time on my machine.
• To compute the same result, the new implementation only takes 0.117 s. Finer-tuned measurements showed that it runs about80 to 90 times faster than an opt build of the old implementation.
• For the 45th element of the Fibonacci sequence, XLR now takes 6.45s, whereas optimized C code takes 5.69s (compiled with gcc), less than 15% faster. Unoptimized C code runs in 17.03s, so XLR handily beats unoptimized C...
So here we are: for simple code, XLR is now within 15% of optimized C code! The remaining difference is probably due to how LLVM optimizes JIT-ed code. Some fine tuning might help, but I won't even bother for the remaining performance tweaks. Someone more versed in fine-tuning LLVM may make some suggestions. Also, this was with ToT of LLVM today, which is not necessarily release quality.
More details and links here: http://xlr.sourceforge.net/node/19.