Inquiry About ADDA Performance for Large Superellipsoids (r ≈ 3 µm)

50 views
Skip to first unread message

wl...@mail.ustc.edu.cn

unread,
Apr 30, 2025, 2:43:03 AMApr 30
to adda-d...@googlegroups.com

Dear ADDA Developers,

Greetings from China! I am a graduate student currently using ADDA to simulate the scattering matrix of randomly oriented superellipsoidal particles. I greatly appreciate your team's development of such a powerful and flexible tool.

Recently, however, I encountered some difficulties. When simulating particles with a radius of approximately 3 µm at a wavelength of 355 nm, the computation has been running for more than 5 days on a server with 76 CPU cores, yet it has not completed. I am unsure whether this is due to limitations in ADDA's applicability to large particles of this shape, or if my simulation parameters may be suboptimal.

To help clarify the issue, I have attached the following files:

  • avg_params.dat — my orientation averaging configuration

  • run_adda_test.sh — the shell script used to launch the simulation

  • nohup.out — partial output from the running process

I would be truly grateful if you could kindly offer any insights or suggestions regarding the cause of this slowdown or potential improvements to my settings.

Thank you very much for your time and for making ADDA freely available to the scientific community.

Wishing you all the best in your work and life!


Wang Laibin,

university of science and technology of China.


input.rar

Maxim Yurkin

unread,
May 2, 2025, 11:02:28 AMMay 2
to adda-d...@googlegroups.com
Dear Laibin,

In short, you are indeed approaching the limitations of the DDA. I assume that 5 days is not for a single run, but for
the script with several runs, all with orientation averaging, however, only the second one is still in progress based on
the output. The major issue is large number of iterations of the iterative solver.

I have run the following command line (on 80 cores):
-shape superellipsoid 1 0.5 1 1 -m 1.5 0.001 -eq_rad 2.965913 -lambda 0.355 -eps 4 -dpl 15 -iter qmr2 -pol fcd -no_vol_cor
that is equivalent to yours, but without orientation averaging. It resulted in 20 401 iterations, which agrees with your
result for the default orientation. Other orientations are somewhat faster.

Then I also added `-int fcd`, which is recommended whenever you employ the FCD formulation, i.e. it should always be a
combination `-int fcd -pol fcd`. The command line is:
-shape superellipsoid 1 0.5 1 1 -m 1.5 0.001 -eq_rad 2.965913 -lambda 0.355 -eps 4 -dpl 15 -iter qmr2 -pol fcd
-no_vol_cor -int fcd
and it leads to 11 998 iterations.

Finally, I also tried bcgs2 for the iterative solver (which is know to help sometimes when the convergence is slow):
-shape superellipsoid 1 0.5 1 1 -m 1.5 0.001 -eq_rad 2.965913 -lambda 0.355 -eps 4 -dpl 15 -iter bcgs2 -pol fcd
-no_vol_cor -int fcd
It leads to 3407 iterations, but each of them is roughly 4 times slower (since it requires 4 matrix-vector product).
Thus, there is no improvement from this iterative solver.

Another idea is to lift the convergence threshold to `-eps 3`, for the full FCD case (second above) it will lead to 6824
iterations. This shouldn't lead to significant additional errors (since you're probably aiming at 1% accuracy) but you
need to verify it at least for a few test cases.

Thus, it seems that acceleration by a factor of 3 is possible, but it will still be slow. On 80 cores of my cluster,
this is roughly 1 hour per one incident polarization. When doing orientation averaging, ADDA will compute two such
polarizations per each orientation. So extremely large runtimes are very possible.

On the one hand, the speed is expected to increase significantly if you increase Im(m) or decrease Re(m), but I guess
the refractive index is determined by the nature of your problem. On the other hand, your script implies simulations for
almost twice larger radius (5.6 um), for them the number of iterations will be at least twice larger (but potentially
much more than that). In combination with larger computational grid, the total time will be at least 8 times longer. And
I do not know any ready-to-use ways to improve this performance.

Maybe, other ADDA users can share their experience with particles of similar size and refractive index.

Bye,
Maxim.

王来彬

unread,
May 6, 2025, 3:28:54 AMMay 6
to ADDA questions and answers

Dear ADDA Developers,

Thank you very much for your previous reply.

I have continued running ADDA on our system as per your suggestions. I would like to ask a few further questions regarding performance optimization:

  1. Since ADDA is written in C, but also includes some routines in Fortran and C++, I am curious whether there is any noticeable difference in execution speed depending on whether it is compiled primarily as a C, Fortran, or C++ project?

  2. I plan to run ADDA on another supercomputing platform based on Intel processors. In that case, would compiling ADDA with ICC and/or using Intel MPI provide a significant performance advantage compared to GCC and OpenMPI?

Thank you for your time and for developing such a powerful simulation tool.

Best regards,

Laibin,

university of science and technology of china.

Maxim Yurkin

unread,
May 6, 2025, 6:48:34 AMMay 6
to adda-d...@googlegroups.com
  1. Since ADDA is written in C, but also includes some routines in Fortran and C++, I am curious whether there is any noticeable difference in execution speed depending on whether it is compiled primarily as a C, Fortran, or C++ project?

I do not think that you can actually control that. The corresponding parts are compiled by corresponding compilers (like, gcc, gfortran, and g++) and the only change you can make is which compiler you use for linking at the end. But the latter should not make any difference at all, since it will anyway use internally a special linker (like ld), and the difference is only in the supplied libraries (whether they're automatically supplied, e.g., C standard libraries when gcc is invoked for linking, or need to be added manually, like is now for Fortran libraries in ADDA makefile).

  1. I plan to run ADDA on another supercomputing platform based on Intel processors. In that case, would compiling ADDA with ICC and/or using Intel MPI provide a significant performance advantage compared to GCC and OpenMPI?

I have been experimenting with that a lot about 10-15 years ago. First, there was indeed some speedup (up to 20%) from using Intel compilers, but later (at least 10 years ago) gcc have evolved significantly (and some optimizations have been implemented in ADDA), so finally I have seen no significant difference. Sometimes, gcc was even marginally faster. Still, the Makefiles include more or less up-to-date optimizations flags for both gcc and Intel compilers, so you can easily try them and benchmark the resulting code. If you do, please share the results with the group.

With regards to Intel MPI, on the one hand, I have never tried it explicitly (only MPICH/MPICH2, OpenMPI, and MSMPI). On the other hand, switching MPI implementation is generally easier than switching compilers, so you may try it as well.

Maxim.

Reply all
Reply to author
Forward
0 new messages