Program is quite big, but all Deal.II dependencies are encapsulated into Solver class(https://github.com/okmechak/RiverSim/blob/master/source/river/solver.hpp)
which is adaptation of step-6 from tutorials(https://www.dealii.org/current/doxygen/deal.II/step_6.html)
It works very well, but I need as much perfomance as it is possible :)
Mesh, as you can see, is very irregular and very dence at some points(see picture above).
On my laptop i5 from wsl(windows subsystem for linux) it takes ~8-10 seconds on DoF with 250,000 degrees(65,000 active cells)
And on cluster with 40 processors even longer - 10-13 seconds
Also is the solver already(by default) multithread or I should somehow configure the Deal.II?
The first thing to do if you want to speed up you code is to profile your code. How do you know which part of the code is slow?
In that case, you will need to use MPI.