As Joel said in previously you need to use a tool that allows "parallelism" with distributed memory.
E.g. with threads, or openmp, or parfor you have "multiple cpu's" accessing same shunk of memory (shared memory), this is what happens in usual concurrency tools.
Since Casadi is not thread safe what you need is multiple cpu's where each one accesses their own memory (distributed memory).
Just to simpliy, 3 cpus run on 1 memory (shared memory) , you need 3 cpu's running on 3 memory (distributed).
You can see the distributed case as having 3 computers with their own casadi instalation.
That is why you need to use something like openMPI. Which allows distributed memory (even using your own computer) and then pass messages between the "fake computers "
I have played arround in c++ with openMPI.
An idea for you specific case would be:
1- generate c-code from your matlab code
Alternativelly there might be some MPI equivalent for Matlab (not sure)