Hi Xin,
Parallelization in JEMRIS is realized by the Master-Slave principle: A single master thread distributes the simulation workload to multiple, singlethreaded workers. The communication between master and slaves is realized by the 'messag passing interface'-Standard (MPI). This is all JEMRIS does, the key to use JEMRIS on a cluster or in the cloud is in setting up your MPI environment accordingly. If you start parallel jemris from the GUI, the respective call to start the mpi environment can be foung in jemris_sim.m (line 87 in version 2.9, look for the call to: 'mpirun [...]'). I am not an expert on MPI and I am sure you will find much more helpful resources for that on google. Just a few thoughts on the process:
To get the expected performance you need to run at least 1 slave process per physical CPU (a few more can boost the performance further since then the processors can continue working while data is transfered.). Thus for a cluster with N nodes each with K cpus, you have to start N*K+1 mpi processes (+1 is for the master process which is just collecting data). AFAIK the default for MPI is to start one process per node which could be the problem for your inadequate performance. Some Info for configuring your MPI environment can be found e.g. here:
https://www.ibm.com/docs/en/pessl/5.3.0?topic=ctopp-choosing-how-many-mpi-tasks-computational-threads-use
For utilizing JEMRIS in the cloud, I used the Amazon web service for a Demo in the ISMRM2019 which worked very well. Back then you could use the AWS just as a local computer, except that you could choose the number of CPUs during startup (up to 64 cores back then, which was way more that I needed).
Best regards,
Daniel