Speed of 3D fluid simulations

Pavan Kashyap

unread,

Jun 4, 2021, 6:12:38 AM6/4/21

to Dedalus Users

Hello,

I am doing 3D DNS simulation of Plane Poiseuille flow. Before implementing anything fancy / bigger, I wanted to try the constant pressure gradient driven case in a small domain. Thanks to the help from Oishi and Daniel, I was able to write the script. The initial condition is from this paper . I observe that it either progresses very slowly or gives an overflow error after a few time units of simulation.

What am I doing wrong here ?

As a comparison : For the same case, it takes about 20s to simulate 20 time units in another spectral solver called channelflow (without any errors). I would like to use Dedalus for my research and would appreciate all the help. Thank you.

*** I do not see any option to attach files except images. So I am pasting the entire script here.

---------------------------------------------------------------------- DNS code ------------------------------------------------------------

import numpy as np

import mpi4py as MPI

import h5py as hp

import dedalus.public as de

from dedalus.extras import flow_tools

import time

import os

from mpi4py import MPI

from dedalus.tools import post

de.logging_setup.rootlogger.setLevel('DEBUG')

import logging

logger = logging.getLogger(__name__)

## Geometrical arameters

Lx=2*np.pi

Ly=np.pi

Lz=2

## Numerical parameters

Nx=32

Ny=32

Nz=32

## Total time of simulation

Tf=20

## Control Parameters

Retau=60

Recl=Retau**2/2.0

## Name of case

csn3d="U"

csn2d="XYslice"

#Time interval to save data

dT3d=10

dT2d=5

## Time interval for computation of CFL and flow monitoring

dT=1.0

## Initial time step

dt=0.01

## Create bases

xbasis=de.Fourier('x',Nx,interval=(0,Lx),dealias=3/2)

ybasis=de.Fourier('y',Ny,interval=(0,Ly),dealias=3/2)

zbasis=de.Chebyshev('z',Nz,interval=(-Lz/2.0,Lz/2.0))

## Creat the domain

domain = de.Domain([xbasis, ybasis, zbasis], grid_dtype=np.float64)

## Declare the problem

problem=de.IVP(domain, variables=['u','uz','v','vz','w','wz','p'])

## Declare the problem parameters

problem.parameters['Re']=Recl

problem.substitutions['U']='1-z**2'

problem.substitutions['Uz']='-2*z'

## Write the wquations

problem.add_equation("dx(u) + dy(v) + wz = 0")

problem.add_equation("dt(u) + U*dx(u) - (1.0/Re)*( dx(dx(u)) + dy(dy(u)) + dz(uz) ) + dx(p) + w*Uz = - u*dx(u) - v*dy(u) - w*uz ")

problem.add_equation("dt(v) + U*dx(v) - (1.0/Re)*( dx(dx(v)) + dy(dy(v)) + dz(vz) ) + dy(p) = - u*dx(v) - v*dy(v) - w*vz ")

problem.add_equation("dt(w) + U*dx(w) - (1.0/Re)*( dx(dx(w)) + dy(dy(w)) + dz(wz) ) + dz(p) = - u*dx(w) - v*dy(w) - w*wz ")

problem.add_equation("uz-dz(u)=0")

problem.add_equation("vz-dz(v)=0")

problem.add_equation("wz-dz(w)=0")

## Add the boundary conditions

problem.add_bc("left(u) = 0")

problem.add_bc("left(v) = 0")

problem.add_bc("left(w) = 0")

problem.add_bc("right(u) = 0")

problem.add_bc("right(v) = 0")

problem.add_bc("right(w) = 0",condition="(nx != 0) or (ny != 0)")

problem.add_bc("integ_z(p) = 0",condition="(nx == 0) and (ny == 0)")

# Build solver and set up the integration limits

solver = problem.build_solver(de.timesteppers.SBDF2)

logger.info('Solver built')

solver.stop_sim_time = Tf

## Set up the initial condition

u = solver.state['u']

uz = solver.state['uz']

v = solver.state['v']

vz = solver.state['vz']

w = solver.state['w']

wz = solver.state['wz']

## Multiple seeds initial condition

nseeds=2

Amp=7

xc,yc,zc=domain.grids(scales=1)

for ns in range(0,nseeds):

a=np.random.uniform(0,Lx)

b=np.random.uniform(0,Ly)

theta=np.deg2rad(np.random.uniform(15,60))

xp=(xc-a)*np.cos(theta) + (yc-b)*np.sin(theta)

yp=-(xc-a)*np.sin(theta) + (yc-b)*np.cos(theta)

u['g']+=0

v['g']+=-Amp*(1-zc**2)*xp*yp*np.exp(-xp**2-yp**2)*(1-5*zc**2)

w['g']+=Amp*zc*(1-zc**2)**2*xp*np.exp(-xp**2-yp**2)*(1-2*yp**2)

u.differentiate('z', out=uz)

v.differentiate('z', out=vz)

w.differentiate('z', out=wz)

### Store full 3D field

Ustore = solver.evaluator.add_file_handler(csn3d, sim_dt=dT3d, max_writes=1)

Ustore.add_system(solver.state)

### Save the XY slices

XYslice = solver.evaluator.add_file_handler(csn2d, sim_dt=dT2d, max_writes=1)

# Save the midplane streamwise velocity

XYslice.add_task("interp(u,z=0)", scales=1,name='u')

#Save Ub(x,y)

XYslice.add_task("integ_z(u)",scales=1,name='ub')

#Save dudz

XYslice.add_task("interp(dz(u),z=1)",scales=1,name='dudzt')

XYslice.add_task("interp(dz(u),z=-1)",scales=1,name='dudzb')

#Save dvdz

XYslice.add_task("interp(dz(v),z=1)",scales=1,name='dvdzt')

XYslice.add_task("interp(dz(v),z=-1)",scales=1,name='dvdzb')

## Save int(u**2), int(v**2), int(w**2)

XYslice.add_task("integ_z(u**2)",scales=1,name='iu2')

XYslice.add_task("integ_z(v**2)",scales=1,name='iv2')

XYslice.add_task("integ_z(w**2)",scales=1,name='iw2')

# CFL

CFL = flow_tools.CFL(solver, initial_dt=dt, safety=0.5, max_dt=0.2)

CFL.add_velocities(('U+u', 'v', 'w'))

# Flow properties

# flow = flow_tools.GlobalFlowProperty(solver)

# flow.add_property("sqrt(u*u + v*v + w*w)", name='L2')

# Main loop

try:

logger.info('Starting loop')

start_run_time = time.time()

count=0

while solver.ok:

# Compute CFL and adjust

dt = CFL.compute_dt()

nsteps=int(dT/dt)

for i in range(0,nsteps):

#Time step

solver.step(dt)

############-------------------------- Print properties of the state -----------------------############

## Viscosity of the flow

nu=1.0/solver.problem.namespace['Re'].value

## Compute the bulk flow

u.set_scales(1)

ub=u['g'].mean()

## Compute the wall shear

dudy=u.differentiate('z')

dudy.set_scales(1)

DuDy=0.5*( np.abs((-2+np.average(dudy['g'][:,:,0])))+np.abs((2+np.average(dudy['g'][:,:,-1]))) )

tauw=nu*DuDy

## L2 norm

# L2=flow.grid_average('L2')

## Compute the Cf

cf=2*tauw/(ub+2.0/3.0)**2

## Log the info

logger.info(f"Niter={solver.iteration : d}")

logger.info(f"T={solver.sim_time : 0.2f}")

logger.info(f"dt={dt : 0.5f}")

# logger.info(f"L2Norm={L2 : 0.4e}")

logger.info(f"Retau={np.sqrt(tauw)/nu : 0.4f}")

logger.info(f"Reb={(ub+2.0/3.0)/nu : 0.4f}")

logger.info(f"Cf={cf:0.2e}")

except:

logger.error('Exception raised, triggering end of main loop.')

raise

finally:

end_run_time = time.time()

logger.info('Iterations: %i' %solver.iteration)

logger.info('Sim end time: %f' %solver.sim_time)

logger.info('Run time: %.2f sec' %(end_run_time-start_run_time))

logger.info('Run time: %f cpu-hr' %((end_run_time-start_run_time)/60/60*domain.dist.comm_cart.size))

## Clean up the file system

post.merge_process_files("XYslice", cleanup=True)

post.merge_process_files("U", cleanup=True)

------------------------------------------------------------------ end of script --------------------------------------------------------

I also tried to compute the CFL every time step with the similar conditions as in the 3D Rayleigh Bernard example. But, it slows the code even more.

Thank you for taking time to help me out.

Regards,

Pavan

Keaton Burns

unread,

Jun 4, 2021, 10:10:22 AM6/4/21

to dedalu...@googlegroups.com

Hi Pavan,

A few things at first glance:

1) The script currently doesn’t have dealiasing oh the Chebyshev basis. Adding this might help stability.

2) For the public release of dedalus, the integral BC in pressure could slow things down. Try a Dirichlet condition for the pressure gauge instead, e.g. “right(p) = 0”.

3) You want to be careful with how often you’re computing auxiliary/analysis tasks. For instance you’re not setting the “cadence” parameter on the CFL handler, so the CFLs are being recomputed every timestep even though you’re only using the new values every 100 iterations.

4) It’s slow to allocate operators in the main loop. Any statistics you want to occasionally compute would be better handled through the flow_tools interface or by preallocating any operators. Your operator computing “u.differentiate(‘z’)" probably isn’t necessary at all, since “uz” is a state variable.

Can you also say a little more about the channelflow comparison and/or include your script for comparison? E.g. is that timing on a single core or multi-core/multithreaded, and is 32^3 the dealiased grid resolution there or the actual modal resolution? Is other filtering employed?

Thanks,

-Keaton

--
You received this message because you are subscribed to the Google Groups "Dedalus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dedalus-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dedalus-users/22044294-ef2b-41cb-99fe-20f4c88632b9n%40googlegroups.com.

Pavan Kashyap

unread,

Jun 4, 2021, 8:13:11 PM6/4/21

to Dedalus Users

Hello Keaton,

Thank you for your quick reply and suggestions. To answer your points :

1.) I had not added the dealiasing in the chebyshev direction as I wanted to compare it with channelflow which by default runs without dealiasing in the chebyshev direction. Although it can be given. I have now incorporated the dealiasing in dedalus for the "z" direction as well. It does give stability. Thanks !!

2.) Thank you for the suggestion on the pressure boundary condition. I changed it to the Dirichlet condition.

3.) CFL : Thank you for pointing it out. My mistake, sorry. I have changed it by giving cadence=10.

4.) Operators : Ooops... I completely missed it. Thank you. I have rectified this as well.

After the corrections :

Dedalus :- Nx,Ny,Nz - (32,32,32)*3/2 ==> (48,48,48) ----------> ~90s

Channelflow :- Nx,Ny,Nz - (48,49,48)-----------------------------------> ~60s

Channelflow conditions :

a) No dealiasing in the Chebyshev direction.

b) Nx,Nz value contains the dealiasing.

Thus, both are running with similar computational load and saving the same amount of slices and 3d field. Both are run on a single core without any MPI. I am attaching the code output from both simulations as well as the dedalus script.

They are really comparable now. Perhaps the additional difference arises from the settings for the CFL ?

Thank you very much for the help.

Regards,

Pavan

test_script1.py

dedalus_output.txt

channelflow_output.txt

Ben Brown

unread,

Jun 5, 2021, 3:08:32 AM6/5/21

to dedalus-users

Pavan,

I spent some time seeing if I could improve the performance of your script. Running your original script on a single core on my Ubuntu laptop, I also had a run time of about 90 seconds, so I think my testing machine is comparable to yours.

After various improvements (attached script, 'test_script2.py'), I've gotten your run time down to 45 seconds.

Some things I did for speed:

changed nonlinear term to vorticity X u form -- substantially fewer transposes required for this form, valid for incompressible. Used related approach for viscous term (Kx, Ky, Kz are the curl of vorticity, and are the negative laplacian for incompressible flow).
changed the CFL to be calculated on 'u' rather than 'U+u'. Since the U.grad advection terms are on the LHS, they are linearly implicit and don't need to factor into the CFL.
added thresholding to the cfl calculation, so that cfl changes below a threshold are rejected
changed the in-the-loop calculation of tauw and other things to an evaluator running on a sim_dt=dT cadence.
Added metadata for variables with dirichlet boundary conditions. This didn't seem to make a difference.

Some other improvements:

the original random generators did not have seed values; this made repeated runs not reproducible. I've set an explicit seed and now subsequent runs reproduce the same results
We've found that best practices are to filter noise so that it's not present in the highest wave number modes. Lines 125-128 do this. We've found that things like incompressibility end up being violated at moderate levels in some cases when tricks like this are not done.

the original script called u['g'].mean to calculate a bulk flow. This is a grid-point mean, and is not correctly weighted (the chebyshev grid is non-uniform). ub is now calculated via vol_avg()
I factored out the startup time in "equivalent run time" calculation; but this doesn't seem to make a difference in single core runs. This will matter a lot in parallel comparisons.

I haven't tried running your script in parallel yet; I also haven't looked to see if certain parts could be made either simpler or faster. But hopefully this helps.

I also ran a test on a Macbook Pro M1 laptop; there this script ran in about 38 seconds.

Lastly, it'll be interesting to test this on the d3 version of dedalus once the beta release for that comes out later this summer; we've spent a lot of time optimizing performance there, and this problem will be an interesting comparison point.

Best,

--Ben

To view this discussion on the web visit https://groups.google.com/d/msgid/dedalus-users/09188e82-7ac5-45f9-9561-414a287b1b41n%40googlegroups.com.

test_script2.py

Pavan Kashyap

unread,

Jun 7, 2021, 4:55:00 AM6/7/21

to Dedalus Users

Hello Ben,

Sorry for my late reply. Over the weekend, I was in a place without internet. Thank you so much for taking a look at the script and improving it..... !!

I ran the script on my Ubuntu desktop (single core) : ~50 sec

Thank you for all the explanations, it really helps. I am just amazed at the versatility of Dedalus. When Keaton made the comment about computing the terms in the main loop I had not thought about how everything could be done with substitutions and the evaluators. Thank you for showing it !!

Lines 125-128 - Filtering : When I had visualized the flow, there were indeed some oscillations that were present in the initial times. Observable if the domain is a bit bigger. I had thought that they were there because of the localized nature of the initial condition i.e projecting a gaussian onto a truncated Fourier basis and the resolution was insufficient (Gibbs phenomena ?). Now, those "waves" are bit less visible. Would you suggest the same thing if the initial condition was random field ? like :

-------------------------------------------------------------------------------

## Random initial condition

gshape = domain.dist.grid_layout.global_shape(scales=1)

slices = domain.dist.grid_layout.slices(scales=1)

rand = np.random.RandomState(seed=23)

noise = rand.standard_normal(gshape)[slices]

u['g']= noise

## Initialize v

rand = np.random.RandomState(seed=30)

noise = rand.standard_normal(gshape)[slices]

v['g']= noise

## Compute w

temp = - de.operators.differentiate(u,'x') - de.operators.differentiate(v,'y')

w=temp.evaluate()

u.differentiate('z', out=uz)

v.differentiate('z', out=vz)

w.differentiate('z', out=wz)

----------------------------------------------------------------------------------

I will test it a bit more and then run with a bigger domain on the cluster. I will update when I have more numbers.

Thank you for making such a wonderful tool opensource...!!

Regards,

Pavan

Pavan Kashyap

unread,

Jun 8, 2021, 6:53:28 AM6/8/21

to Dedalus Users

Hello Ben,

I was looking at the script and noticed a few things. I missed it initially.

1) The right hand side should have another term right ? . The nonlinear term gets replaced with the gradient of the velocity dot product and the cross product between velocity and vorticity, right (see attached file) ?

2) the symbol 'v' is used as both problem variable i.e y - velocity component and parameter viscosity. How does Dedalus separate between the two definitions ?

Regards,

Pavan

On Saturday, June 5, 2021 at 9:08:32 AM UTC+2 Ben Brown wrote:

NSeq.pdf

Geoffrey Vasil

unread,

Jun 8, 2021, 7:41:26 AM6/8/21

to dedalu...@googlegroups.com

Hey Pavan,

The missing term you’re wondering about has been absorbed into a new definition of the dynamics pressure. This means that if you *actually* want the *actual* pressure for some reason, you’d have to add the correction in then. Whatever Ben is calling “p” has a slightly different meaning from what you originally had. But the velocity solutions would be identical.

And I don’t have to look, but I’m almost sure that Ben used a Greek letter nu (ν) which looks *almost identical* to a roman letter v in almost any font. Bad Ben! :P

-Geoff

To view this discussion on the web visit https://groups.google.com/d/msgid/dedalus-users/d92626c8-3488-4d9a-8df8-dc46a5bf8c58n%40googlegroups.com.
<NSeq.pdf>

Pavan Kashyap

unread,

Jun 8, 2021, 10:19:15 AM6/8/21

to Dedalus Users

Hello Geoffrey,

Thank you for the explanation. Absorbing the velocity dot product into the pressure definition makes sense. Understood.

You can type greek letters in the script ? I did not know that. I was viewing the code in Spyder and saw it as roman "v". Thanks for the clarification.

Regards,

Pavan

Jeffrey S. Oishi

unread,

Jun 8, 2021, 10:22:59 AM6/8/21

to dedalus-users

Hi Pavan,

Python 3 supports unicode, so you can use any valid unicode character in the script. As you can see, this may or may not be the best choice.

Jeff

To view this discussion on the web visit https://groups.google.com/d/msgid/dedalus-users/0abe7ff5-96b9-402e-bc11-e0a0dc61028cn%40googlegroups.com.

Reply all

Reply to author

Forward