Restarting a CLM simulation

59 views
Skip to first unread message

Leonardo Sandoval Pabon

unread,
Jan 30, 2025, 12:06:37 PMJan 30
to ParFlow
Hi everyone!

I'm trying to restart a simulation of PF+CLM. The simulations run successfully, and while the PF variables (subsurface flow and surface flow) show very good continuity between the first and second simulations, the SWE exhibits a rather strong discontinuity. I checked the spatial distribution of SWE in my domain, and there is a clear error at the initial time step of the second year (see Figure). It seems as if the initial conditions were somehow shifted.
ded70007-9ff5-4de6-9aa0-87a5c2d8e531.png

I am using 40 cores and dividing the domain into 8 parts in the x-direction and 5 parts in the y-direction. As a result, I have a total of 40 clm.rst.xxxx.p files, which I copy according to the instructions from the short courses. I’m not sure what I’m doing wrong, so any guidance would be greatly appreciated.

I am running this on a node of the Galileo100 Supercomputer. Each node has 2 × Intel Cascade Lake 8260 CPUs (24 cores each, 2.4 GHz) and 384GB RAM. If it helps, I am attaching my Python script along with the drv_clm.dat files for the first and second years.

Thank you! I'm happy to share more inputs if needed.


drv_clmin.dat
drv_clmin_2010.dat
08_calibration.py

Reed M. Maxwell

unread,
Jan 30, 2025, 7:22:00 PMJan 30
to ParFlow

Hi Leondardo-

 

Do you see shifts in your other CLM variables after the restart?  Are you restarting part way through the year or at the end of the water year?  Can you include the code you are using to copy over the restart files?  (probably as a snippet in the email body, .py etc files are blocked when attached).

 

I’m actively restarting the CONUS2.1 simulations and just checked the SWE (I’ve checked movies of the SWE from CONUS2.0 as well); the current version is running on 70x48 processors and things seem to be working smoothly.  The portion of my script that copies the restart files during an automated restart process is below.  Hope this helps

 

Reed

 

##  Restart process below

#copy over RST file

for ii in range(nproc):

    rst_to = 'clm.rst.'+f'{istep:05d}'+'.'+f'{(ii):d}'

    rst_from = 'clm.rst.00000.'+f'{(ii):d}'

    shutil.copy(rst_from,rst_to)

 

upper Columbia River basin near the US-CA border, this is the first CLM step after a restart

 

 

From: par...@googlegroups.com <par...@googlegroups.com> on behalf of Leonardo Sandoval Pabon <rlsand...@gmail.com>
Date: Thursday, January 30, 2025 at 12:06
PM
To: ParFlow <par...@googlegroups.com>
Subject: Restarting a CLM simulation

Hi everyone!

I'm trying to restart a simulation of PF+CLM. The simulations run successfully, and while the PF variables (subsurface flow and surface flow) show very good continuity between the first and second simulations, the SWE exhibits a rather strong discontinuity. I checked the spatial distribution of SWE in my domain, and there is a clear error at the initial time step of the second year (see Figure). It seems as if the initial conditions were somehow shifted.

 

I am using 40 cores and dividing the domain into 8 parts in the x-direction and 5 parts in the y-direction. As a result, I have a total of 40 clm.rst.xxxx.p files, which I copy according to the instructions from the short courses. I’m not sure what I’m doing wrong, so any guidance would be greatly appreciated.

 

I am running this on a node of the Galileo100 Supercomputer. Each node has 2 × Intel Cascade Lake 8260 CPUs (24 cores each, 2.4 GHz) and 384GB RAM. If it helps, I am attaching my Python script along with the drv_clm.dat files for the first and second years.

Thank you! I'm happy to share more inputs if needed.

 

 

--
You received this message because you are subscribed to the Google Groups "ParFlow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to parflow+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/parflow/a2db59fb-97f5-48c8-95e9-223570a38206n%40googlegroups.com.

Leonardo Sandoval Pabon

unread,
Jan 31, 2025, 4:47:28 AMJan 31
to ParFlow
Yes, after inspecting the other variables I see that they have the same behavior/problem (See figure). In the first simulation, I run 8,760 hours (1 year) with dump intervals of 24 hours (ParFlow) and 1 hour (CLM). Then, I run the second simulation also for 1 year. 
f06497cc-aecb-458a-bb5b-e2cb497b3723.png

To restart the simulation, first I set the Runname.Solver.CLM.WriteLastRST = True.Then, to copy the restart files I simply run this line from an SH file once the first simulation is complete for i in {0..39}; do mv clm.rst.00000.$i clm.rst.08760.$i; done

Since I identified this issue in the CLM results I tried a different method. I set the Runname.Solver.CLM.WriteLastRST = False and the Calibration.Solver.CLM.DailyRST = True. In this second approach I do not copy the restart files. I simply delete all the restart files different from clm.rst.08760.*. However, the problem persists.

The lines where I restart my simulation in the SLURM/SH file look like this (when using the first approach):

# RUN 2009
cp ../../Data/PFB/IC_Dynamic.pfb ./ip_solid.pfb                     # Copy Initial condition for pressure
cp ../../Data/DAT/drv_clmin_2009.dat ./drv_clmin.dat    
cp ../../Data/DAT/drv_vegm_2009.dat ./drv_vegm.dat
python3 08_calibration.py 0 8760 365 2009                           # Run first simulation
python3 ../../Codes/PY/simplify_dynamic.py 2009 365 0               # Aggregate CLM outputs and move them to a scratch folder
rm *.dist                                                           # Undist all files

# RUN 2010
cp Calibration.out.press.08760.pfb ./ip_solid.pfb                   # Copy the pressure file from the previous simulation
cp ../../Data/DAT/drv_clmin_2010.dat ./drv_clmin.dat        
cp ../../Data/DAT/drv_vegm_2010.dat ./drv_vegm.dat
for i in {0..39}; do mv clm.rst.00000.$i clm.rst.08760.$i; done     # Rename the restart files
python3 08_calibration.py 8760 17520 365 2010                       # Run the second simulation        
rm *.dist                                                           # Undist all files    


Thank you,
Leonardo


Reed M. Maxwell

unread,
Jan 31, 2025, 4:51:40 AMJan 31
to ParFlow

What’s in simplify_dynamic.py?  how are you writing your output?

 

 

 

From: par...@googlegroups.com <par...@googlegroups.com> on behalf of Leonardo Sandoval Pabon <rlsand...@gmail.com>
Date: Friday, January 31, 2025 at 4:47
AM
To: ParFlow <par...@googlegroups.com>
Subject: Re: Restarting a CLM simulation

Yes, after inspecting the other variables I see that they have the same behavior/problem (See figure). In the first simulation, I run 8,760 hours (1 year) with dump intervals of 24 hours (ParFlow) and 1 hour (CLM). Then, I run the second simulation also for 1 year. 

Leonardo Sandoval Pabon

unread,
Jan 31, 2025, 5:06:22 AMJan 31
to ParFlow

Simplify_dynamic is a rather long script that essentially moves files to a scratch folder to clean up the working folder. The main tasks performed by this script are:

  1. Copying all Runname.out.press.* and Runname.out.satur.* files to a scratch folder. Note that the pressure and saturation files from the last time step of the first simulation remain in the working folder.

  2. Aggregating all CLM variables by day. The script computes the sum of each variable per day and saves the aggregated values in a .npy file, which is stored in the scratch folder. Note that the CLM file from the last hour of the first simulation remains in the working folder.

  3. Making a copy of the clm.rst.xxx files in the scratch folder. This is just a copy; the original files remain in the working folder.

Thanks, 
Leonardo

Nick Jadallah

unread,
Feb 1, 2025, 6:38:33 AMFeb 1
to ParFlow
Hey Leonardo, 

I just skimmed this thread and I thought I'd ask something quickly because something caught my eye. In your drv_clm.dat file, you have the startcode and clm_ic set to (2) and not (1). 

###############

! IC Source: (1) restart file, (2) drv_clmin.dat (this file)

!

startcode      2                                     1=restart file,2=defined

clm_ic         2                                     1=restart file,2=defined

###############

When these numbers are set to (2), no matter what you do with the .rst files, your CLM variables will restart with the defaults as described in your .dat file. If you change those numbers to (1), then CLM knows to look for the .rst files and restart using the "initial conditions" saved in those files. And not restart using the "initial conditions" from the .dat file. 

You may want to try changing those codes to (1) and then running again to see if that helps! 

Also, @Reed, please correct me if I've misunderstood anything! 

Best, 
Nick 

Leonardo Sandoval Pabon

unread,
Feb 1, 2025, 8:31:09 AMFeb 1
to ParFlow
Thanks, Nick!

For the first simulation, I set these codes to 2, as seen in the drv_clmin.dat file. However, for the second simulation (i.e., when restarting the simulation), I use the drv_clmin_2010.dat file, where these codes are set to 1. So, in theory, for the second simulation, I am using the information stored in the clm.rst.000* files, as you suggested.

I believe I may have found the cause of my issue. I'm still running a test, but I’ve already had one successful restart. If I confirm that this is indeed the solution, I will post an update here for future reference for the ParFlow-CLM community.

Thanks!
Leonardo


Leonardo Sandoval Pabon

unread,
Feb 7, 2025, 3:13:06 AMFeb 7
to ParFlow
After some testing, I discovered the reason for this weird behavior. In my simulations, I used one drv_vegm.dat file for the first year and a different drv_vegm.dat file for the second year (i.e., after restarting). Apparently, changing this file causes the odd behavior in the simulations. When I restarted the simulation using the same drv_vegm.dat file before and after restarting, there were no issues.

Out of curiosity, I checked my second drv_vegm.dat file to see if it contained a corrupted line or something similar, but it seems perfectly fine. In fact, it is possible to run an entire year with this file, and the results of the simulation at the end of the year are normal. At this point, I am not sure if this is a ParFlow-CLM restriction.

Thank you,
Leonardo

Reply all
Reply to author
Forward
0 new messages