NaN and Inf detected Error After 2nd Model Day Run

215 views
Skip to first unread message

Danielle Manalaysay

unread,
Mar 8, 2016, 6:45:10 AM3/8/16
to HYCOM.org Forum
Hi All, 

I am trying to run a Philippine-wide (South East Asia) nested model with 1/25 resolution. I have experienced this (NaN and Inf detected) error before but it was caused by some error in the restart file used. This time I have checked my input files (relax, forcing, archive, restart) with hycom_sea_ok and hycom_NaN but they seem to be okay.

I have also tried on clipping my min depth from 1m to 10m but experienced the same error. 

I have read here in the forum that this kind of error might be caused by too large baclin, I have tried adjusting the baclin to 5sec but it seemed too small that the model run very slow (ie 1 model day run at 23 hours at 4 threads, 30 cores)

What other factors/input should I check? Any help would do. 

Attached here is the log file.

Thanks!
012y115b.log

Alan

unread,
Mar 8, 2016, 9:56:10 AM3/8/16
to HYCOM.org Forum
It is counter-intuative, but you should always make batrop as large as possible.  The program topo_batrop can help you set its value.

An example script:

ajax 107> cat depth_GOMl0.04_03i_batrop.csh
#
set echo
#
# --- external gravity wave speed and barotropic cfl.
#
cd ~/hycom/GOMl0.04/topo
#
setenv FOR051  depth_GOMl0.04_03i.b
setenv FOR051A depth_GOMl0.04_03i.a
setenv FOR061  depth_GOMl0.04_03i_batrop.b
setenv FOR061A depth_GOMl0.04_03i_batrop.a
#
../../ALL/topo/src/topo_batrop

The result:

ajax 108> cat depth_GOMl0.04_03i_batrop.b
541 x 385 depth values (idm,jdm = 541,385).                                   
0.04 deg. resolution                                                          
lon: -98.00 to -76.4000.  lat: 18.09165N to 31.96065N.                        
Merged with depth_GLBb0.08_07 near open boundaries.                           
Fix the open boundaries                                                       
gsp:  min,max =      4.429   280.650
cfl:  min,max =     10.586   671.960

What we actually use for GOMl0.04:

 120.0    'baclin' = baroclinic time step (seconds), int. divisor of 86400
   7.5    'batrop' = barotropic time step (seconds), int. div. of baclin/2

It is possible that 'batrop'=8.0 would work, but typically ~75% of the cfl min is what we use.  If it is too big, the model will blow up in the 1st 1-2 time steps.

Your domain has a similar resolution, so I would expect similar time steps.  Most likely baclin=120 is conservative, my guess is that 240 would work in most cases.

Getting a NaN/Inf within a day of starting generally indicates a problem withe the initial state or the forcing (or the nesting boundaries).  When starting from an interpolated initial state (e.g. starting nesting) a smaller baclin may be needed because there will be lots of gravity waves as the model adjusts.  This should only be needed for the 1st run.

I suggest checking that the boundaries are working as expected, and perhaps plot the SSH to see if there are any "hot spots" in the initial gravity waves.  If you have an archive file at the NaN time, hycom_NaN on this file can tell you where the problem is located.

Alan.

Danielle Manalaysay

unread,
Mar 9, 2016, 9:17:47 AM3/9/16
to HYCOM.org Forum
Hi Sir Alan,

Thank you so much for your reply.
I have tried using topo_batrop but probably since my max depth (along Mindanao trench) is around 9000+km the cfl range is quite bigger than the GOM model. 

bathymetery from 2-minute ETOPO2v2
i/jdm =   775   638
plon,plat range =   99.04000 130.00000 0.32000 25.00610
Close interior seas
Merged with 2x depth_SEAa0.08_01 near open boundaries.
gsp:  min,max =      3.131   310.510
cfl:  min,max =      9.980  1004.351 

But nevertheless I have tried doing this but the model just keeps blowing up immediately. Right now I will try to run the model by clipping the max depth to 5000 km and hopefully it will work

 
I suggest checking that the boundaries are working as expected,
This may seem trivial but I am wondering what particular checking can be done to this. 


Thank you for usual support!

Dani

Alan

unread,
Mar 9, 2016, 10:23:12 AM3/9/16
to HYCOM.org Forum
It does not matter what the maximum cfl value is, only its minimum value (9.98 seconds).  So 'batrop'=7.5 will likely work (and if not the model will blow up in the first few *time steps*).

If you still get a model blow up after 1-2 days with 'batrop'=7.5 and 'baclin'=120.0 (or 60.0 at the smallest), then there is something wrong with the model setup.  Either the restart file, or the forcing files, or the nesting files (archive files or  the nest rmu file) are bad.

The first thing to do is the plot the results.  Your intial restart should be from the outer model interpolated to the nested grid.  Write a archv file every hour ('diagfq'=.04166667) and see if the 1st hour archive looks like the nested archive that went into the restart.  Then look at the closest hourly archv file to the blow up and see if the location of the problem is obvious.

If everything else is good, a common problem is a bad rmu file or not getting both nesting relaxation and barotropic open boundaries setup correctly.

Alan.

Danielle Manalaysay

unread,
Mar 10, 2016, 9:05:02 AM3/10/16
to HYCOM.org Forum

Hi Alan, 

Thank you again to your inputs, it seems that my model blows up due to bad restart file due to bad relax files. 

I generated relax files as based from the GOM model (using PHC and relaxi). Attached here is a plot  and there seem to be something weird in the upper layer, it's not isopycnal rather square edges?

Best regards, 
Dani
011_relax_166_2.ps

Alan

unread,
Mar 10, 2016, 1:46:15 PM3/10/16
to HYCOM.org Forum
The relax files often have discontinuities, but these get balanced out (via gravity waves) over the first day or two if the run.

For nesting, you first run for 1 model day (say) from climatology with no nested boundary and in fact no forcing at all.  This should be for the same month as the outer model started from (typically January).

Then use this restart as a "template" to convert one of the nested 3-D archives to a restart file.  At this point almost all the fields in the restart are from the archive file.  I would not recommend starting a nested run from climatology.

Alan.

Danielle Manalaysay

unread,
Mar 11, 2016, 8:43:18 AM3/11/16
to HYCOM.org Forum
Hi Alan, 
 
 
For nesting, you first run for 1 model day (say) from climatology with no nested boundary and in fact no forcing at all.  This should be for the same month as the outer model started from (typically January).

Then use this restart as a "template" to convert one of the nested 3-D archives to a restart file.  At this point almost all the fields in the restart are from the archive file.  I would not recommend starting a nested run from climatology.

Yes, I understand. I usually run a 1 model day from climatology with no nested boundary and no forcing to create a restart template.  Then use this restart template to create a restart from archive (from interpolated subregion from GLB archive ), using archv2restart. 

As you have suggested before I tried to plot the hourly archive and compared it with the global archive, yes they are quite the same. But there is inconsistency with topography, there's a displaced seamount in along the 114.80E and I think it is where blow-up is coming from.  I attached here the plots. Should I base the interpolated archive with the coarser topography/bathy or the finer bathy? Or is it enough to base just boundaries from the outer coarser model? 

Again thank you so much for your usual support.

Best regards, 
Dani
 
GLB0.08_152_archv_orig.ps
SEA0.04_archv_152_02_output.ps
SEA0.04_archv_152_06_output.ps
SEAa0.04_archv_152_00_input.ps

Alan

unread,
Mar 11, 2016, 9:19:22 AM3/11/16
to HYCOM.org Forum
Usually it is enough to base just the nesting relaxation zone on the coarser grid bathymetry, but if a seamount goes way entirely (or substatially) on the finer grid then the interpolation has to fill in the new water where the seamount existed before.  The existing "new water" technique is to simply project the original bottom water all the way to the bottom.  This implicitly assumes that the discrepancy either has a small extent in depth (e.g. in shallow water) or b) is in deep water.

The isubaregion program has a 'smooth' option, which you could try for the nested archive that goes into the restart.  However, I doubt that is enough.

I suggest first seeing what is going on with this seamount.  It it just wrong in the 0.08 bathymetry or does the 0.04 bathyemetry also have problems?  If you can live with putting the 0.08 seamount into your 0.04 bathymetry, that would fix the problem (this just requires updating the merge you are already doing for near the nested boundaries).  If not, make all the files needed to interpolate the SEAa0.08 archive to SEAa0.04 available and I'll see if I can modify isubaregion to handle this case better.

Alan.

Danielle Manalaysay

unread,
Mar 14, 2016, 9:11:20 AM3/14/16
to HYCOM.org Forum
Hi Alan, 

Thank you again for your support, I just resolved to place the seamount on the finer bathymetry since it is not actually part of the PH area and create the relaxation buffer zone bigger and hopefully it will work. I just have some clarification with some of your inputs:

In creating a "template" restart file is it okay to use a very small time step here say baclin=20 batrop=3? I observed that using a large time step causes a blow-up in the model even if it is just from climatology with no nested boundary and no forcing. And you mentioned early on that  
> If you still get a model blow up after 1-2 days with 'batrop'=7.5 and 'baclin'=120.0 (or 60.0 at the smallest), then there is something wrong with the model setup. 
or is this just applicable for the running a nested model?

Another questing is how big should the relaxation zones, I understand that this may vary depending on the domain size and resolution used, but how is this calculated?
and how long should be the efolding time (in IAS there's a comment that the min should be at least 8x than timestep)? 

Best regards, 
Dani

Alan

unread,
Mar 14, 2016, 9:56:05 AM3/14/16
to HYCOM.org Forum
 Never use a small batrop, always make this a large as practical (as I outlined in a post above).

Otherwise, whatever baclin that works is ok for the producing the template restart.  Note that it should not be as unstable when you put back the seamount.

The minimum size for the nesting relaxation zone is 5 grid points on the outer grid, and I like to have 10 outer grid points.

You get to nearly full strength in about 5 e-folding periods, so setting the strongest relaxation e-folding time to 1/5 to 1/10 of the 3-D nesting archive sampling period often makes sense.  With daily 3-D archives we commonly use .1-10 day e-folding time across the relaxation zone.

Alan.
Reply all
Reply to author
Forward
0 new messages