Firstly, can I check are you using the latest release 4.1.0? There have been many speed ups (inc ptrace) in this release.
Second, are the dimensions and sel the same for each ptrace? I made a dense ptrace function a while ago due to similar issues, in which the permutation array is reused. That is the perm is only calculated once, and this is the typically the most expensive bit. I have not got round to adding this to qutip yet, but this could be the motivation I need.
If your matrices are dense, then probably you would be better to maintain them as such, i.e. as ndarray. Use the scipy kron for the tensor. Then create the Qobj when you need it for cy_ode_rhs. There is also a discussion about how much we could do with a dense Qobj. QuTiP is optimised for sparse matrix operations.