Speed of 3D fluid simulations

741 views
Skip to first unread message

Pavan Kashyap

unread,
Jun 4, 2021, 6:12:38 AM6/4/21
to Dedalus Users
Hello,

I am doing 3D DNS simulation of Plane Poiseuille flow. Before implementing anything fancy / bigger, I wanted to try the constant pressure gradient driven case in a small domain. Thanks to the help from Oishi and  Daniel, I was able to write the script.  The initial condition is from this paper . I observe that it either progresses very slowly or gives an overflow error after a few time units of simulation. 

What am I doing wrong here ? 

As a comparison : For the same case, it takes about 20s to simulate 20 time units in another spectral solver called channelflow (without any errors). I would like to use Dedalus for my research and would appreciate all the help. Thank you.

*** I do not see any option to attach files except images. So I am pasting the entire script here.

---------------------------------------------------------------------- DNS code ------------------------------------------------------------
import numpy as np
import mpi4py as MPI
import h5py as hp
import dedalus.public as de
from dedalus.extras import flow_tools
import time
import os
from mpi4py import MPI
from dedalus.tools import post

de.logging_setup.rootlogger.setLevel('DEBUG')

import logging
logger = logging.getLogger(__name__)


## Geometrical arameters
Lx=2*np.pi
Ly=np.pi
Lz=2

## Numerical parameters
Nx=32
Ny=32
Nz=32

## Total time of simulation 
Tf=20

## Control Parameters
Retau=60
Recl=Retau**2/2.0

## Name of case
csn3d="U"
csn2d="XYslice"

#Time interval to save data
dT3d=10
dT2d=5

## Time interval for computation of CFL and flow monitoring
dT=1.0

## Initial time step
dt=0.01

## Create bases 
xbasis=de.Fourier('x',Nx,interval=(0,Lx),dealias=3/2)
ybasis=de.Fourier('y',Ny,interval=(0,Ly),dealias=3/2)
zbasis=de.Chebyshev('z',Nz,interval=(-Lz/2.0,Lz/2.0))

## Creat the domain
domain = de.Domain([xbasis, ybasis, zbasis], grid_dtype=np.float64)

## Declare the problem
problem=de.IVP(domain, variables=['u','uz','v','vz','w','wz','p'])

## Declare the problem parameters
problem.parameters['Re']=Recl
problem.substitutions['U']='1-z**2'
problem.substitutions['Uz']='-2*z'


## Write the wquations
problem.add_equation("dx(u) + dy(v) + wz = 0")
problem.add_equation("dt(u) + U*dx(u) - (1.0/Re)*( dx(dx(u)) + dy(dy(u)) + dz(uz) ) + dx(p) + w*Uz  =  - u*dx(u) - v*dy(u) - w*uz ")
problem.add_equation("dt(v) + U*dx(v) - (1.0/Re)*( dx(dx(v)) + dy(dy(v)) + dz(vz) ) + dy(p)         =  - u*dx(v) - v*dy(v) - w*vz ")
problem.add_equation("dt(w) + U*dx(w) - (1.0/Re)*( dx(dx(w)) + dy(dy(w)) + dz(wz) ) + dz(p)         =  - u*dx(w) - v*dy(w) - w*wz ")
problem.add_equation("uz-dz(u)=0")
problem.add_equation("vz-dz(v)=0")
problem.add_equation("wz-dz(w)=0")

## Add the boundary conditions 
problem.add_bc("left(u) = 0")
problem.add_bc("left(v) = 0")
problem.add_bc("left(w) = 0")

problem.add_bc("right(u) = 0")
problem.add_bc("right(v) = 0")
problem.add_bc("right(w) = 0",condition="(nx != 0) or (ny != 0)")
problem.add_bc("integ_z(p) = 0",condition="(nx == 0) and (ny == 0)")

# Build solver and set up the integration limits
solver = problem.build_solver(de.timesteppers.SBDF2)
logger.info('Solver built')
solver.stop_sim_time = Tf

## Set up the initial condition
u = solver.state['u']
uz = solver.state['uz']
v = solver.state['v']
vz = solver.state['vz']
w = solver.state['w']
wz = solver.state['wz']


## Multiple seeds initial condition
nseeds=2
Amp=7
xc,yc,zc=domain.grids(scales=1)

for ns in range(0,nseeds):
    
    a=np.random.uniform(0,Lx)
    b=np.random.uniform(0,Ly)
    theta=np.deg2rad(np.random.uniform(15,60))
                
    xp=(xc-a)*np.cos(theta) + (yc-b)*np.sin(theta)
    yp=-(xc-a)*np.sin(theta) + (yc-b)*np.cos(theta)
                
    u['g']+=0
    v['g']+=-Amp*(1-zc**2)*xp*yp*np.exp(-xp**2-yp**2)*(1-5*zc**2)
    w['g']+=Amp*zc*(1-zc**2)**2*xp*np.exp(-xp**2-yp**2)*(1-2*yp**2)
    
    
u.differentiate('z', out=uz)
v.differentiate('z', out=vz)
w.differentiate('z', out=wz)


### Store full 3D field
Ustore = solver.evaluator.add_file_handler(csn3d, sim_dt=dT3d, max_writes=1) 
Ustore.add_system(solver.state)

### Save the XY slices
XYslice = solver.evaluator.add_file_handler(csn2d, sim_dt=dT2d, max_writes=1)

# Save the midplane streamwise velocity
XYslice.add_task("interp(u,z=0)", scales=1,name='u')
#Save Ub(x,y)
XYslice.add_task("integ_z(u)",scales=1,name='ub')
#Save dudz
XYslice.add_task("interp(dz(u),z=1)",scales=1,name='dudzt')
XYslice.add_task("interp(dz(u),z=-1)",scales=1,name='dudzb')
#Save dvdz
XYslice.add_task("interp(dz(v),z=1)",scales=1,name='dvdzt')
XYslice.add_task("interp(dz(v),z=-1)",scales=1,name='dvdzb')
## Save int(u**2), int(v**2), int(w**2)
XYslice.add_task("integ_z(u**2)",scales=1,name='iu2')
XYslice.add_task("integ_z(v**2)",scales=1,name='iv2')
XYslice.add_task("integ_z(w**2)",scales=1,name='iw2')


# CFL
CFL = flow_tools.CFL(solver, initial_dt=dt, safety=0.5, max_dt=0.2)
CFL.add_velocities(('U+u', 'v', 'w'))

# Flow properties
# flow = flow_tools.GlobalFlowProperty(solver)
# flow.add_property("sqrt(u*u + v*v + w*w)", name='L2')

# Main loop
try:
    logger.info('Starting loop')
    start_run_time = time.time()
    count=0
    while solver.ok:
        
        # Compute CFL and adjust
        dt = CFL.compute_dt()
        
        nsteps=int(dT/dt)
        
        for i in range(0,nsteps):
            
            #Time step
            solver.step(dt)
        
############-------------------------- Print properties of the state -----------------------############
        
        ## Viscosity of the flow
        nu=1.0/solver.problem.namespace['Re'].value
        
        ## Compute the bulk flow
        u.set_scales(1)
        ub=u['g'].mean()
        
        ## Compute the wall shear
        dudy=u.differentiate('z')
        dudy.set_scales(1)
        DuDy=0.5*( np.abs((-2+np.average(dudy['g'][:,:,0])))+np.abs((2+np.average(dudy['g'][:,:,-1]))) )
        tauw=nu*DuDy
        
        ## L2 norm
        
        # L2=flow.grid_average('L2')
        
        ## Compute the Cf
        cf=2*tauw/(ub+2.0/3.0)**2
    
        ## Log the info
        logger.info(f"Niter={solver.iteration : d}")
        logger.info(f"T={solver.sim_time : 0.2f}")
        logger.info(f"dt={dt : 0.5f}")
        # logger.info(f"L2Norm={L2 : 0.4e}")
        logger.info(f"Retau={np.sqrt(tauw)/nu : 0.4f}")
        logger.info(f"Reb={(ub+2.0/3.0)/nu : 0.4f}")
        logger.info(f"Cf={cf:0.2e}")
        
        
except:
    logger.error('Exception raised, triggering end of main loop.')
    raise
finally:
    end_run_time = time.time()
    logger.info('Iterations: %i' %solver.iteration)
    logger.info('Sim end time: %f' %solver.sim_time)
    logger.info('Run time: %.2f sec' %(end_run_time-start_run_time))
    logger.info('Run time: %f cpu-hr' %((end_run_time-start_run_time)/60/60*domain.dist.comm_cart.size))
    
    ## Clean up the file system    
    post.merge_process_files("XYslice", cleanup=True)        
    post.merge_process_files("U", cleanup=True)
    
------------------------------------------------------------------ end of script --------------------------------------------------------

I also tried to compute the CFL every time step with the similar conditions as in the 3D Rayleigh Bernard example. But, it slows the code even more.

Thank you for taking time to help me out.

Regards,

Pavan





Keaton Burns

unread,
Jun 4, 2021, 10:10:22 AM6/4/21
to dedalu...@googlegroups.com
Hi Pavan,

A few things at first glance:

1) The script currently doesn’t have dealiasing oh the Chebyshev basis.  Adding this might help stability.

2) For the public release of dedalus, the integral BC in pressure could slow things down.  Try a Dirichlet condition for the pressure gauge instead, e.g. “right(p) = 0”.

3) You want to be careful with how often you’re computing auxiliary/analysis tasks.  For instance you’re not setting the “cadence” parameter on the CFL handler, so the CFLs are being recomputed every timestep even though you’re only using the new values every 100 iterations.

4) It’s slow to allocate operators in the main loop.  Any statistics you want to occasionally compute would be better handled through the flow_tools interface or by preallocating any operators.  Your operator computing “u.differentiate(‘z’)" probably isn’t necessary at all, since “uz” is a state variable.

Can you also say a little more about the channelflow comparison and/or include your script for comparison?  E.g. is that timing on a single core or multi-core/multithreaded, and is 32^3 the dealiased grid resolution there or the actual modal resolution?  Is other filtering employed?

Thanks,
-Keaton
--
You received this message because you are subscribed to the Google Groups "Dedalus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dedalus-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dedalus-users/22044294-ef2b-41cb-99fe-20f4c88632b9n%40googlegroups.com.

Pavan Kashyap

unread,
Jun 4, 2021, 8:13:11 PM6/4/21
to Dedalus Users
Hello Keaton,

Thank you for your quick reply and suggestions. To answer your points :

1.) I had not added the dealiasing in the chebyshev direction as I wanted to compare it with channelflow which by default runs without dealiasing in the chebyshev direction. Although it can be given. I have now incorporated the dealiasing in dedalus for the "z" direction as well. It does give stability. Thanks !!

2.) Thank you for the suggestion on the pressure boundary condition. I changed it to the Dirichlet condition. 

3.) CFL :  Thank you for pointing it out. My mistake, sorry. I have changed it by giving cadence=10. 

4.) Operators : Ooops... I completely missed it. Thank you. I have rectified this as well.

After the corrections : 

Dedalus :- Nx,Ny,Nz - (32,32,32)*3/2 ==> (48,48,48) ----------> ~90s
Channelflow :- Nx,Ny,Nz - (48,49,48)-----------------------------------> ~60s
Channelflow conditions :
a) No dealiasing in the Chebyshev direction.  
b) Nx,Nz value contains the dealiasing. 

Thus, both are running with similar computational load and saving the same amount of slices and 3d field.  Both are run on a single core without any MPI. I am attaching the code output from both simulations as well as the dedalus script.

They are really comparable now. Perhaps the additional difference arises from the settings for the CFL ?

Thank you very much for the help.

Regards,

Pavan
test_script1.py
dedalus_output.txt
channelflow_output.txt

Ben Brown

unread,
Jun 5, 2021, 3:08:32 AM6/5/21
to dedalus-users
Pavan,
      I spent some time seeing if I could improve the performance of your script.  Running your original script on a single core on my Ubuntu laptop, I also had a run time of about 90 seconds, so I think my testing machine is comparable to yours. 

After various improvements (attached script, 'test_script2.py'), I've gotten your run time down to 45 seconds.

Some things I did for speed:
  • changed nonlinear term to vorticity X u form -- substantially fewer transposes required for this form, valid for incompressible.  Used related approach for viscous term (Kx, Ky, Kz are the curl of vorticity, and are the negative laplacian for incompressible flow).
  • changed the CFL to be calculated on 'u' rather than 'U+u'.  Since the U.grad advection terms are on the LHS, they are linearly implicit and don't need to factor into the CFL.
  • added thresholding to the cfl calculation, so that cfl changes below a threshold are rejected
  • changed the in-the-loop calculation of tauw and other things to an evaluator running on a sim_dt=dT cadence.
  • Added metadata for variables with dirichlet boundary conditions.  This didn't seem to make a difference.
Some other improvements:
  • the original random generators did not have seed values; this made repeated runs not reproducible.  I've set an explicit seed and now subsequent runs reproduce the same results
  • We've found that best practices are to filter noise so that it's not present in the highest wave number modes.  Lines 125-128 do this.  We've found that things like incompressibility end up being violated at moderate levels in some cases when tricks like this are not done.
  • the original script called u['g'].mean to calculate a bulk flow.  This is a grid-point mean, and is not correctly weighted (the chebyshev grid is non-uniform).  ub is now calculated via vol_avg()
  • I factored out the startup time in "equivalent run time" calculation; but this doesn't seem to make a difference in single core runs.  This will matter a lot in parallel comparisons.
I haven't tried running your script in parallel yet; I also haven't looked to see if certain parts could be made either simpler or faster.  But hopefully this helps.

I also ran a test on a Macbook Pro M1 laptop; there this script ran in about 38 seconds.

Lastly, it'll be interesting to test this on the d3 version of dedalus once the beta release for that comes out later this summer; we've spent a lot of time optimizing performance there, and this problem will be an interesting comparison point.

Best,
--Ben

test_script2.py

Pavan Kashyap

unread,
Jun 7, 2021, 4:55:00 AM6/7/21
to Dedalus Users
Hello Ben,

Sorry for my late reply. Over the weekend, I was in a place without internet. Thank you so much for taking a look at the script and improving it..... !!

I ran the script on my Ubuntu desktop (single core) : ~50 sec

Thank you for all the explanations, it really helps. I am just amazed at the versatility of Dedalus. When Keaton made the comment about computing the terms in the main loop I had not thought about how everything could be done with substitutions and the evaluators.  Thank you for showing it !! 

Lines 125-128 - Filtering : When I had visualized the flow, there were indeed some oscillations that were present in the initial times. Observable if the domain is a bit bigger. I had thought that they were there because of the localized nature of the initial condition i.e projecting a gaussian onto a truncated Fourier basis and the resolution was insufficient (Gibbs phenomena ?). Now, those "waves" are bit less visible. Would you suggest the same thing if the initial condition was random field ? like :

-------------------------------------------------------------------------------
## Random initial condition
gshape = domain.dist.grid_layout.global_shape(scales=1)
slices = domain.dist.grid_layout.slices(scales=1)
rand = np.random.RandomState(seed=23)
noise = rand.standard_normal(gshape)[slices]
u['g']= noise

## Initialize v
rand = np.random.RandomState(seed=30)
noise = rand.standard_normal(gshape)[slices]
v['g']= noise

## Compute w
temp = - de.operators.differentiate(u,'x') - de.operators.differentiate(v,'y')
w=temp.evaluate()
    
u.differentiate('z', out=uz)
v.differentiate('z', out=vz)
w.differentiate('z', out=wz)
----------------------------------------------------------------------------------

I will test it a bit more and then run with a bigger domain on the cluster. I will update when I have more numbers.

Thank you for making such a wonderful tool opensource...!!

Regards,

Pavan

Pavan Kashyap

unread,
Jun 8, 2021, 6:53:28 AM6/8/21
to Dedalus Users
Hello Ben,

I was looking at the script and noticed a few things. I missed it initially. 

1) The right hand side should have another term right ? . The nonlinear term gets replaced with the gradient of the velocity dot product and the cross product between velocity and vorticity, right (see attached file) ? 

2) the symbol 'v' is used as both problem variable i.e y - velocity component  and parameter viscosity. How does Dedalus separate between the two definitions ?

Regards,

Pavan

On Saturday, June 5, 2021 at 9:08:32 AM UTC+2 Ben Brown wrote:
NSeq.pdf

Geoffrey Vasil

unread,
Jun 8, 2021, 7:41:26 AM6/8/21
to dedalu...@googlegroups.com
Hey Pavan, 

The missing term you’re wondering about has been absorbed into a new definition of the dynamics pressure. This means that if you *actually* want the *actual* pressure for some reason, you’d have to add the correction in then. Whatever Ben is calling “p” has a slightly different meaning from what you originally had. But the velocity solutions would be identical. 

And I don’t have to look, but I’m almost sure that Ben used a Greek letter nu (ν) which looks *almost identical* to a roman letter v in almost any font. Bad Ben! :P

-Geoff

Pavan Kashyap

unread,
Jun 8, 2021, 10:19:15 AM6/8/21
to Dedalus Users
Hello Geoffrey,

Thank you for the explanation. Absorbing the velocity dot product into the pressure definition makes sense. Understood. 

You can type greek letters in the script ? I did not know that. I was viewing the code in Spyder and saw it as roman "v". Thanks for the clarification.

Regards,

Pavan

Jeffrey S. Oishi

unread,
Jun 8, 2021, 10:22:59 AM6/8/21
to dedalus-users
Hi Pavan,

Python 3 supports unicode, so you can use any valid unicode character in the script. As you can see, this may or may not be the best choice.

Jeff

Reply all
Reply to author
Forward
0 new messages