Observations on CPU usage on a single processor

Pete Griffin

unread,

Aug 3, 2016, 2:06:12 PM8/3/16

to deal.II User Group

Hello All.

This is more of an observation with questions, I have been looking closely at CPU usage on a single Quad-Core processor with and without PETSc/MPI.

I was surprised to see all 4 cores (8 threads) being used in step-8 which does not have PETSc/MPI support. In the solver, the main thread was using 100% of it's resources. The others were using very consistently 24-25% independent of #DOF. Is the normal SolverCG<> of step-8 multi-theaded? Would the overall performance on a single processor improve if one were able to have multiple controlling threads instead of one. Is it even possible? This might be even more important for the 8 or 16 core CPUs I see available now. If I am correct the code that exists was probably optimum with Single or Dual Core CPUs but it does not appear to be true now. It may not be feasible but I though I'd mention it.

Also, it appears that the efficiency with one CPU decreases as the #DOF increase with PETSc/MPI. Does anyone know why?

Thanks

Pete Griffin

=========================================================================================

step-8 NO PETSc/MPI

=========================================================================================

Cycle 7 of step-8

Threads: 546 total, 2 running, 543 sleeping, 1 stopped, 0 zombie

%Cpu(s): 34.6 us, 0.1 sy, 0.0 ni, 65.2 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem : 16325840 total, 11629396 free, 1940364 used, 2756080 buff/cache

KiB Swap: 16668668 total, 15488860 free, 1179808 used. 13757012 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

6752 pgriffin 20 0 2330772 1.260g 48340 R 99.9 8.1 0:41.49 step-8

6765 pgriffin 20 0 2330772 1.260g 48340 S 25.6 8.1 0:04.43 step-8

6760 pgriffin 20 0 2330772 1.260g 48340 S 25.2 8.1 0:04.43 step-8

6761 pgriffin 20 0 2330772 1.260g 48340 S 25.2 8.1 0:04.37 step-8

6763 pgriffin 20 0 2330772 1.260g 48340 S 25.2 8.1 0:04.34 step-8

6766 pgriffin 20 0 2330772 1.260g 48340 S 25.2 8.1 0:04.36 step-8

6762 pgriffin 20 0 2330772 1.260g 48340 S 24.9 8.1 0:04.33 step-8

6764 pgriffin 20 0 2330772 1.260g 48340 S 24.9 8.1 0:04.40 step-8

Cycle 8 of step-8

Threads: 547 total, 9 running, 537 sleeping, 1 stopped, 0 zombie

%Cpu(s): 36.2 us, 0.0 sy, 0.0 ni, 63.6 id, 0.1 wa, 0.0 hi, 0.1 si, 0.0 st

KiB Mem : 16325840 total, 9097668 free, 4475188 used, 2752984 buff/cache

KiB Swap: 16668668 total, 15488876 free, 1179792 used. 11225328 avail Mem

Cycle 8 of step-8

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

6752 pgriffin 20 0 4807840 3.672g 48340 R 99.7 23.6 1:52.88 step-8

6761 pgriffin 20 0 4807840 3.672g 48340 R 27.2 23.6 0:12.15 step-8

6764 pgriffin 20 0 4807840 3.672g 48340 R 27.2 23.6 0:12.15 step-8

6760 pgriffin 20 0 4807840 3.672g 48340 R 26.9 23.6 0:12.26 step-8

6765 pgriffin 20 0 4807840 3.672g 48340 R 26.9 23.6 0:12.24 step-8

6766 pgriffin 20 0 4807840 3.672g 48340 R 26.9 23.6 0:11.93 step-8

6762 pgriffin 20 0 4807840 3.672g 48340 R 26.6 23.6 0:12.10 step-8

6763 pgriffin 20 0 4807840 3.672g 48340 R 25.6 23.6 0:12.17 step-8

Cycle 9 of step-8

Threads: 543 total, 2 running, 540 sleeping, 1 stopped, 0 zombie

%Cpu(s): 34.0 us, 0.3 sy, 0.0 ni, 65.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem : 16325840 total, 4041224 free, 9150464 used, 3134152 buff/cache

KiB Swap: 16668668 total, 15489424 free, 1179244 used. 6552112 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

6982 pgriffin 20 0 9311500 8.120g 48408 R 99.9 52.2 6:16.43 step-8

6991 pgriffin 20 0 9311500 8.120g 48408 S 24.6 52.2 0:48.27 step-8

6992 pgriffin 20 0 9311500 8.120g 48408 S 24.6 52.2 0:48.34 step-8

6993 pgriffin 20 0 9311500 8.120g 48408 S 24.3 52.2 0:48.15 step-8

6994 pgriffin 20 0 9311500 8.120g 48408 S 24.3 52.2 0:48.31 step-8

6995 pgriffin 20 0 9311500 8.120g 48408 S 24.3 52.2 0:48.30 step-8

6996 pgriffin 20 0 9311500 8.120g 48408 S 24.3 52.2 0:48.15 step-8

6990 pgriffin 20 0 9311500 8.120g 48408 S 23.9 52.2 0:48.20 step-8

=========================================================================================

step-17 OLD with PETSc/MPI uses max_couplings_between_dofs(), high memory usage/DOF

=========================================================================================

Cycle 6 of step-17 OLD

Threads: 540 total, 9 running, 530 sleeping, 1 stopped, 0 zombie

%Cpu(s): 24.3 us, 75.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem : 16325840 total, 8039968 free, 3926700 used, 4359172 buff/cache

KiB Swap: 16668668 total, 15491740 free, 1176928 used. 11769984 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

7332 pgriffin 20 0 4043180 3.149g 54160 R 99.9 20.2 0:04.97 step-17

7338 pgriffin 20 0 4043180 3.149g 54160 R 99.9 20.2 0:04.90 step-17

7331 pgriffin 20 0 4043180 3.149g 54160 R 99.9 20.2 0:12.75 step-17

7334 pgriffin 20 0 4043180 3.149g 54160 R 99.9 20.2 0:04.86 step-17

7335 pgriffin 20 0 4043180 3.149g 54160 R 99.7 20.2 0:04.88 step-17

7337 pgriffin 20 0 4043180 3.149g 54160 R 99.7 20.2 0:04.91 step-17

7333 pgriffin 20 0 4043180 3.149g 54160 R 99.3 20.2 0:04.94 step-17

7336 pgriffin 20 0 4043180 3.149g 54160 R 99.3 20.2 0:04.84 step-17

Cycle 7 of step-17 OLD

Threads: 542 total, 9 running, 532 sleeping, 1 stopped, 0 zombie

%Cpu(s): 21.0 us, 53.8 sy, 0.0 ni, 25.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem : 16325840 total, 1494532 free, 10477460 used, 4353848 buff/cache

KiB Swap: 16668668 total, 15491740 free, 1176928 used. 5224572 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

7331 pgriffin 20 0 10.090g 9.382g 54224 R 99.9 60.3 0:55.10 step-17

7334 pgriffin 20 0 10.090g 9.382g 54224 R 71.8 60.3 0:21.78 step-17

7335 pgriffin 20 0 10.090g 9.382g 54224 R 71.4 60.3 0:21.67 step-17

7336 pgriffin 20 0 10.090g 9.382g 54224 R 71.4 60.3 0:21.55 step-17

7332 pgriffin 20 0 10.090g 9.382g 54224 R 71.1 60.3 0:21.78 step-17

7333 pgriffin 20 0 10.090g 9.382g 54224 R 71.1 60.3 0:21.79 step-17

7337 pgriffin 20 0 10.090g 9.382g 54224 R 70.8 60.3 0:21.89 step-17

7338 pgriffin 20 0 10.090g 9.382g 54224 R 69.8 60.3 0:21.50 step-17

=========================================================================================

step-17 NEW with PETSc/MPI uses DynamicSparsityPattern

=========================================================================================

Cycle 6 of step-17 NEW

top - 05:53:39 up 7 days, 9:00, 1 user, load average: 1.84, 1.66, 1.62

Threads: 539 total, 9 running, 529 sleeping, 1 stopped, 0 zombie

%Cpu(s): 24.0 us, 76.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem : 16325840 total, 13719720 free, 1130428 used, 1475692 buff/cache

KiB Swap: 16668668 total, 15388984 free, 1279684 used. 14687740 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

1710 pgriffin 20 0 1342740 598964 55076 R 99.9 3.7 0:14.40 step-17

1712 pgriffin 20 0 1342740 598964 55076 R 99.9 3.7 0:05.05 step-17

1711 pgriffin 20 0 1342740 598964 55076 R 99.7 3.7 0:05.11 step-17

1714 pgriffin 20 0 1342740 598964 55076 R 99.7 3.7 0:05.07 step-17

1715 pgriffin 20 0 1342740 598964 55076 R 99.7 3.7 0:05.07 step-17

1716 pgriffin 20 0 1342740 598964 55076 R 99.7 3.7 0:05.07 step-17

1717 pgriffin 20 0 1342740 598964 55076 R 99.0 3.7 0:04.97 step-17

1713 pgriffin 20 0 1342740 598964 55076 R 98.7 3.7 0:05.02 step-17

Cycle 7 of step-17 NEW

Threads: 541 total, 9 running, 531 sleeping, 1 stopped, 0 zombie

%Cpu(s): 21.0 us, 54.0 sy, 0.0 ni, 25.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem : 16325840 total, 12567900 free, 2284752 used, 1473188 buff/cache

KiB Swap: 16668668 total, 15389040 free, 1279628 used. 13536156 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

1773 pgriffin 20 0 2482828 1.657g 55080 R 99.9 10.6 1:03.73 step-17

1781 pgriffin 20 0 2482828 1.657g 55080 R 71.8 10.6 0:21.70 step-17

1776 pgriffin 20 0 2482828 1.657g 55080 R 71.4 10.6 0:21.80 step-17

1777 pgriffin 20 0 2482828 1.657g 55080 R 71.4 10.6 0:21.84 step-17

1775 pgriffin 20 0 2482828 1.657g 55080 R 71.1 10.6 0:21.71 step-17

1779 pgriffin 20 0 2482828 1.657g 55080 R 71.1 10.6 0:21.76 step-17

1780 pgriffin 20 0 2482828 1.657g 55080 R 71.1 10.6 0:21.66 step-17

1778 pgriffin 20 0 2482828 1.657g 55080 R 69.4 10.6 0:21.44 step-17

Cycle 8 of step-17 NEW

Threads: 539 total, 9 running, 529 sleeping, 1 stopped, 0 zombie

%Cpu(s): 16.3 us, 20.5 sy, 0.0 ni, 63.1 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem : 16325840 total, 9603772 free, 5828536 used, 893532 buff/cache

KiB Swap: 16668668 total, 15388100 free, 1280568 used. 9999864 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

1404 pgriffin 20 0 6046792 5.051g 55048 R 99.9 32.4 3:10.39 step-17

1407 pgriffin 20 0 6046792 5.051g 55048 R 28.2 32.4 0:40.49 step-17

1410 pgriffin 20 0 6046792 5.051g 55048 R 28.2 32.4 0:40.27 step-17

1406 pgriffin 20 0 6046792 5.051g 55048 R 27.9 32.4 0:40.51 step-17

1412 pgriffin 20 0 6046792 5.051g 55048 R 27.9 32.4 0:40.45 step-17

1408 pgriffin 20 0 6046792 5.051g 55048 R 27.6 32.4 0:40.60 step-17

1411 pgriffin 20 0 6046792 5.051g 55048 R 27.6 32.4 0:40.22 step-17

1409 pgriffin 20 0 6046792 5.051g 55048 R 26.9 32.4 0:39.94 step-17

Bruno Turcksin

unread,

Aug 3, 2016, 2:42:21 PM8/3/16

to deal.II User Group

Pete,

On Wednesday, August 3, 2016 at 2:06:12 PM UTC-4, Pete Griffin wrote:

I was surprised to see all 4 cores (8 threads) being used in step-8 which does not have PETSc/MPI support. In the solver, the main thread was using 100% of it's resources. The others were using very consistently 24-25% independent of #DOF. Is the normal SolverCG<> of step-8 multi-theaded? Would the overall performance on a single processor improve if one were able to have multiple controlling threads instead of one. Is it even possible? This might be even more important for the 8 or 16 core CPUs I see available now. If I am correct the code that exists was probably optimum with Single or Dual Core CPUs but it does not appear to be true now. It may not be feasible but I though I'd mention it.

Also, it appears that the efficiency with one CPU decreases as the #DOF increase with PETSc/MPI. Does anyone know why?

Some parts of deal.II (including the solvers but not the preconditioners which may be what you are seeing here) are multithreaded. By default deal.II will use as many threads as possible. I am not sure what you mean by controlling threads but I have used multithreading on workstations with 64 cores and it works like expected. Since you have done zero effort to use multithreading in your code, you cannot expect to get 100% of usage on all your threads. This is also true if you use MPI. Deal.II will launch one process but it will use 8 threads.

Best,

Bruo

Wolfgang Bangerth

unread,

Aug 3, 2016, 2:44:05 PM8/3/16

to dea...@googlegroups.com

Pete,

> I was surprised to see all 4 cores (8 threads) being used in step-8 which does
> not have PETSc/MPI support. In the solver, the main thread was using 100% of
> it's resources. The others were using very consistently 24-25% independent of
> #DOF. Is the normal SolverCG<> of step-8 multi-theaded?

Yes. Various parts of the library are multi-threaded and doing matrix-vector
multiplications is one of those parts.

> Would the overall
> performance on a single processor improve if one were able to have multiple
> controlling threads instead of one. Is it even possible? This might be even
> more important for the 8 or 16 core CPUs I see available now. If I am correct
> the code that exists was probably optimum with Single or Dual Core CPUs but it
> does not appear to be true now. It may not be feasible but I though I'd
> mention it.

It is possible to write some programs in a way so that there are multiple
controlling threads, but this is not the case for typical finite element
programs. If you look at a typical program, it almost always has the form
do this (e.g., assemble the linear system)
then
do that (e.g., solve the linear system)
The "then" is a synchronization point. However, "do this" and "do that" may
well be parallelizable, and we often do that in the library if possible. An
example may be the loop over all cells in assembly. But it is difficult to do
this with all statements, and that is where the inefficiency comes from.

> Also, it appears that the efficiency with one CPU decreases as the #DOF
> increase with PETSc/MPI. Does anyone know why?

How do you define "efficiency"?

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.tamu.edu/~bangerth/

Pete Griffin

unread,

Aug 3, 2016, 7:41:02 PM8/3/16

to deal.II User Group

Thank you Bruno and Wolfgang for your quick responses.

Bruno, I assumed that the thread with 100% CPU usage was somehow feeding the others in step-8, because I saw no other reason, on the surface, for a difference in thread CPU usage. The one at 100% always had the same ID. Based on this, perhaps erroneous conclusion, I thought it may have been a bottleneck. I think that using threads in my code is not relevant to what I was seeing. One thing at a time, I will use threads in the future. I was looking at CPU usage only when the the code was running in the SolverCG<> That is why the data provided had high cycle numbers. I had plenty of time to catch the very consistent relevant TOP -H output and without modifying the dealii code I could not get a closer look.

I just tested the step-8 program with PreconditionIdentity() witch showed 100% CPU usage on all 8 CPUs. The results follow. Assuming having no preconditioner only slows things down maybe getting 3 times the CPU power will make it up. I haven't checked solve times yet. The preconditioner for step-8 was PreconditionSSOR<> with relaxation parameter = 1.2. Is there an optimum preconditioner/relaxation parameter for 3d elasticity problems that you know of? Is their determination only by the trial and error?

Wolfgang, What I meant by efficiency was the CPU usage in the threads for Step-17 NEW and OLD decreased with the larger #DOFs or cycle #'s. not necessarily memory usage since OLD had a significant amount of unused memory allocated. It was not as obvious in OLD since I didn't have enough memory to go to cycle 8. The results for step-8 showed no change in CPU% (remained at 24%) even at cycle 9. All cycles at the same cycle # had the same #DOFs. The main here PIDs differ since I restarted each time. I wasn't fast enough!

Still learning!

Thanks again to both of you.

=========================================================================================

step-8 NO PETSc/MPI with PreconditionIdentity()

=========================================================================================

Cycle 7 of step-8

Threads: 623 total, 9 running, 613 sleeping, 1 stopped, 0 zombie

%Cpu(s): 99.8 us, 0.2 sy, 0.0 ni, 0.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem : 16325840 total, 9475772 free, 2345336 used, 4504732 buff/cache

KiB Swap: 16668668 total, 15506072 free, 1162596 used. 13269936 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9841 pgriffin 20 0 2330760 1.260g 48784 R 99.7 8.1 0:13.58 step-8

9842 pgriffin 20 0 2330760 1.260g 48784 R 99.7 8.1 0:13.60 step-8

9843 pgriffin 20 0 2330760 1.260g 48784 R 99.7 8.1 0:13.63 step-8

9829 pgriffin 20 0 2330760 1.260g 48784 R 99.3 8.1 0:36.44 step-8

9838 pgriffin 20 0 2330760 1.260g 48784 R 99.3 8.1 0:13.63 step-8

9839 pgriffin 20 0 2330760 1.260g 48784 R 99.3 8.1 0:13.62 step-8

9840 pgriffin 20 0 2330760 1.260g 48784 R 99.0 8.1 0:13.62 step-8

9837 pgriffin 20 0 2330760 1.260g 48784 R 98.0 8.1 0:13.56 step-8

Cycle 8 of step-8

Threads: 625 total, 9 running, 615 sleeping, 1 stopped, 0 zombie

%Cpu(s): 99.6 us, 0.3 sy, 0.0 ni, 0.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem : 16325840 total, 6939168 free, 4882364 used, 4504308 buff/cache

KiB Swap: 16668668 total, 15506072 free, 1162596 used. 10733500 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9829 pgriffin 20 0 4807848 3.673g 48784 R 99.9 23.6 1:56.22 step-8

9840 pgriffin 20 0 4807848 3.673g 48784 R 99.9 23.6 0:49.47 step-8

9837 pgriffin 20 0 4807848 3.673g 48784 R 99.3 23.6 0:49.54 step-8

9839 pgriffin 20 0 4807848 3.673g 48784 R 99.3 23.6 0:49.69 step-8

9841 pgriffin 20 0 4807848 3.673g 48784 R 99.3 23.6 0:49.54 step-8

9842 pgriffin 20 0 4807848 3.673g 48784 R 99.3 23.6 0:49.28 step-8

9838 pgriffin 20 0 4807848 3.673g 48784 R 98.3 23.6 0:49.62 step-8

9843 pgriffin 20 0 4807848 3.673g 48784 R 98.0 23.6 0:49.45 step-8

Cycle 9 of step-8

Threads: 613 total, 9 running, 603 sleeping, 1 stopped, 0 zombie

%Cpu(s): 99.5 us, 0.3 sy, 0.0 ni, 0.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem : 16325840 total, 3390632 free, 9577768 used, 3357440 buff/cache

KiB Swap: 16668668 total, 15506152 free, 1162516 used. 6038460 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9959 pgriffin 20 0 9311500 8.122g 48888 R 99.9 52.2 3:59.23 step-8

9953 pgriffin 20 0 9311500 8.122g 48888 R 99.7 52.2 3:59.26 step-8

9962 pgriffin 20 0 9311500 8.122g 48888 R 99.7 52.2 3:59.67 step-8

9963 pgriffin 20 0 9311500 8.122g 48888 R 99.3 52.2 3:59.94 step-8

9945 pgriffin 20 0 9311500 8.122g 48888 R 99.0 52.2 7:20.13 step-8

9954 pgriffin 20 0 9311500 8.122g 48888 R 98.3 52.2 4:00.18 step-8

9961 pgriffin 20 0 9311500 8.122g 48888 R 98.3 52.2 4:00.52 step-8

9958 pgriffin 20 0 9311500 8.122g 48888 R 94.1 52.2 4:00.11 step-8

=======================================================================================

On Wednesday, August 3, 2016 at 2:06:12 PM UTC-4, Pete Griffin wrote:

Wolfgang Bangerth

unread,

Aug 5, 2016, 12:35:23 PM8/5/16

to dea...@googlegroups.com

Pete,

> Bruno, I assumed that the thread with 100% CPU usage was somehow feeding the
> others in step-8,

It's more like for some functions, we split operations onto as many threads as
there are CPUs. But then, the next function you call may not be parallelized,
and so everything only works on one thread. On average, that one thread has a
load of 100% whereas the others have a lesser load.

> I just tested the step-8 program with PreconditionIdentity() witch showed 100%
> CPU usage on all 8 CPUs. The results follow. Assuming having no preconditioner
> only slows things down maybe getting 3 times the CPU power will make it up. I
> haven't checked solve times yet. The preconditioner for step-8 was
> PreconditionSSOR<> with relaxation parameter = 1.2. Is there an optimum
> preconditioner/relaxation parameter for 3d elasticity problems that you know
> of? Is their determination only by the trial and error?

1.2 seems to be what a lot of people use.

As for thread use: if you use PreconditionIdentity, *all* major operations
that CG calls are parallelized. On the other hand, using PreconditionSSOR, you
will be spend at least 50% of your time in the preconditioner, but SSOR is a
sequential method where you need to compute the update for one vector element
before you can more to the next. So it cannot be parallelized, and
consequently your average thread load will be less than 100%.

Neither of these are good preconditioners in the big scheme of things, if you
envision going to large problems. For those, you ought to use variations of
the multigrid method.

> Wolfgang, What I meant by efficiency was the CPU usage in the threads for
> Step-17 NEW and OLD decreased with the larger #DOFs or cycle #'s.

If the load decreased for both codes, I would attribute this to memory
traffic. If the problem is small enough, much of it will fit into the caches
of the processor/cores, and so you get high throughput. If the problem becomes
bigger, processors wait for data for longer. Waiting is, IIRC, still counted
as processor load, but it may make some operations that are not parallelized
take longer than those that are parallelized, and so overall lead to a lower
average thread load.

But that's only a theory that would require a lot more digging to verify.

Pete Griffin

unread,

Aug 5, 2016, 2:41:07 PM8/5/16

to deal.II User Group

Wolfgang, thanks again for your response.

When I started to try to implement PETSc/MPI I didn't realize the normal solverSG<> was running with all CPUs. As of now I have no need for PETSc/MPI. I did a little more looking and found that on a single processor it was no faster than the non-PETSc/MPI. For the 3D elastic problem I was working on I found, also, by trial and error, that SSOR was fastest even with 25% CPU usage with a relaxation parameter = 1.2. Jacobi which showed 80-90% CPU usage in the threads was actually slower. Both were faster the PETSc/MPI. I guess PETSc/MPI is more for large problems on multiple processors. At least I got PETSc/MPI working, which may be useful in the future.

Pete Griffin

Wolfgang Bangerth

unread,

Aug 5, 2016, 4:37:26 PM8/5/16

to dea...@googlegroups.com

Pete,

> When I started to try to implement PETSc/MPI I didn't realize the normal
> solverSG<> was running with all CPUs. As of now I have no need for PETSc/MPI.
> I did a little more looking and found that on a single processor it was no
> faster than the non-PETSc/MPI. For the 3D elastic problem I was working on I
> found, also, by trial and error, that SSOR was fastest even with 25% CPU usage
> with a relaxation parameter = 1.2. Jacobi which showed 80-90% CPU usage in the
> threads was actually slower.

Yes, that's what you often find: the algorithms that are easy to parallelize
are just not good enough to really compete with the more complex algorithms
(which are then harder to parallelize efficiently).

Reply all

Reply to author

Forward