Algorithmic differentiation using "source code transformation" will always have a speed benefit over a strategy using "operator overloading" (aka computing on-tge-fly, e.g. adolc).
CasADi is a bit of a special case.
You construct a symbolic representation of a computation, and perform "source code transformation" on that representation.
If you call plain CasADi Functions (in c++/matlab alike), there is some code (written in casadi c++ core) that loops over the symbolic computations. We call this the CasADi virtual machine. This has overhead.
To remove the overhead and get the speed of a C "source code transformation" tool, you have to go the extra step of code-generation. You can do this in with Function.generate(...)+compile+external(...), or by providing a 'jit' true option to the Functiom constructor (best paired with 'compiler' 'shell'), or by using the codegen API. Typical speedup factor ~4.
CasADi C++ will typically be only marginally faster than CasADi Matlab.
If you compare speed of evaluation of Matlab CasADi including a mex step, to C++ CasADi without codegen/jit, the Matlab variant will be faster.
Best regards,
Joris
The various tools only differ in how they implement the fundamental concepts of forward mode and adjoint/reverse mode.
In practice, implementation details have an impact on speed. You can find a section of the two implementation families (operator overloading and source code transformation) on that wiki link.
I'm confused about your mex useage. Are you manually writing mex files now with c++ CasADi in them?