Hi! First of all - thank you for awesome library.
I'm solving a large-scale optimization problem (non-linear least squares, hundreds of thousand variables) via nlpsol. Initially i used SX to construct my loss. Unfortunately it takes a lot to compute gradient and hessian of my function, even more than actual nlp solve. As far as i understand how casadi AD works, several big MX operations should be easier to differentiate than thousand of SX. So i decided to rewrite my loss via MX.
While my whole loss can be easily expressed as several (4-5) casadi matrix operations there is one which i struggle to write effectively via MX - batch matrix-vector multiplication. I need to multiply a several thousands of small vectors (3x1) by a same number of small matrices (3x3).
So i have two questions:
1. Is there a way to implement batch matrix-vector multiplications via vanilla casadi operations? I tried to implement it by map, but it gives me quite small speed up for symbolic hessian (probably less than x2 compare to fully SX version).
2. I can implement my own callback which performs batch matrix-vector multiplication as one big matrix operation with jacobian and jacobian-of-jacobian with appropriate sparsity. Is there are any tricks to help casadi to take and compute derivatives? Maybe i should implement forward/reverse operations for my jacobian instead of pure jacobian-of-jacobian. Maybe there are some usefull flags, etc
Sorry for mistakes, English is not my native language