Hello everyone,
When trying to solve a problem. I have noticed a large difference in performance between solve_sparse() and solve_dense() when running on a cluster.
The sparse solver appears to benefit from MPI parallelization and scales reasonably well with increasing core count. However, when using solve_dense(), increasing the number of MPI ranks does not seem to reduce the runtime significantly. From the behavior, it appears that the dense eigensolve may effectively be running on a single process after the matrices are assembled.
Is this the expected behavior of solve_dense()? Does the dense solver currently use a distributed eigensolver, or is the dense matrix gathered onto a single rank before diagonalization?
If the latter, is there any recommended way to parallelize dense eigenvalue solves in Dedalus, or any support for distributed dense eigensolver backends?
Thank you for any clarification.
--
You received this message because you are subscribed to the Google Groups "Dedalus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dedalus-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dedalus-users/a457d054-1d44-4111-b83d-2c1f71f5a34an%40googlegroups.com.
Thanks Calum and Keaton for your replies. I had misattributed the scaling.
For context on size: it's a fluid dynamics eigenvalue problem in a spherical shell, solved by subproblems in , at N_theta = N_r = 64.
With solve_sparse I don't run into this issue, but with solve_dense the single-subproblem solve becomes the wall-clock bottleneck and doesn't improve with more cores — it appears to run on a single core even when I assign ~100 cores to one . Is there any recommended way to speed up solve_dense for a subproblem of this size?
Thanks again.
Thanks Calum and Keaton for your replies. I had misattributed the scaling.
For context on size: it's a fluid dynamics eigenvalue problem in a spherical shell, solved by subproblems in , at N_theta = N_r = 64.
With solve_sparse I don't run into this issue, but with solve_dense the single-subproblem solve becomes the wall-clock bottleneck and doesn't improve with more cores — it appears to run on a single core even when I assign ~100 cores to one . Is there any recommended way to speed up solve_dense for a subproblem of this size?
Thanks again.
On Tuesday, 9 June 2026 at 17:42:45 UTC+5:30 keaton...@gmail.com wrote:
To view this discussion visit https://groups.google.com/d/msgid/dedalus-users/ffd2c443-6696-44a0-8c6f-affaee46e495n%40googlegroups.com.