Re: [WRF-Chem Run] wrfoutput for d01 forms, but when change it to max_dom=2, it hangs after writing for 2 mins. (WRF-Chem)

526 views
Skip to first unread message
Message has been deleted

Stacy Walters

unread,
Jul 12, 2018, 10:08:42 AM7/12/18
to Komal Shukla, wrf-chem-run
Komal,

Just a "shot in the dark", are you possibly overrunning a disk allocation quota?

Stacy

On Thu, Jul 12, 2018 at 9:21 AM, Komal Shukla <komalsh...@gmail.com> wrote:

Hi Members,

I am running WRF-chem with radm2sorg (chem_opt=11)
When I run it for one domain my final wrfout_d01 forms, but when I change it to max_dom=2, it stops after 2 minutes or so, while creating wrfout_d01 and wrfout_d02 (some data in it for first hour, for around 9-10 monutes, it stops here). While there is no error in rsl.error.0000 or rsl.out.0000, slurm(job) also shows it to be running but it doesnt write in wrfoutput_d01 and wrfoutput_d02 file anymore.

Anthro and mozart run well as well, and I have linked my wrfchemi* files for both domains in wd.

when I ran it just for one domain(12 km) it created 23GB wrfoutput successfully, for two nested domains d1(12km), d2(4km) it stops after writing 1GB or so. I am running on 32 CPU, I have changed it to 48, 64 CPUs as well, and changed my timesteps to 72. Same results.

my namelist.input is :

&time_control
run_days = 1,
run_hours = 00,
run_minutes = 00,
run_seconds = 00,
start_year = 2011, 2011, 2011,
start_month = 12, 12, 01,
start_day = 01, 01, 12,
start_hour = 00, 00, 00,
start_minute = 00, 00, 00,
start_second = 00, 00, 00,
end_year = 2011, 2011, 2011,
end_month = 12, 12, 12,
end_day = 02, 02, 31,
end_hour = 00, 00, 18,
end_minute = 00, 00, 00,
end_second = 00, 00, 00,
interval_seconds = 21600,
input_from_file = .true.,.true.,.false.,
history_interval = 60, 60, 1440,
frames_per_outfile = 24, 24, 1,
restart =.false.,
restart_interval = 720,
write_hist_at_0h_rst = .true.
io_form_history = 2,
io_form_restart = 2,
io_form_input = 2,
io_form_boundary = 2,
/

&domains
time_step = 60,
time_step_fract_num = 0,
time_step_fract_den = 3,
max_dom = 2,
e_we = 257, 70, 202,
e_sn = 149, 82, 106,
e_vert = 51, 51, 51,
P_top_requested = 5000,
num_metgrid_levels = 38,
num_metgrid_soil_levels = 4,
dx = 12000, 4000, 10000,
dy = 12000, 4000, 10000,
grid_id = 1, 2, 3,
parent_id = 1, 1, 1,
i_parent_start = 1, 70, 32,
j_parent_start = 1, 52, 76,
parent_grid_ratio = 1, 3, 3,
parent_time_step_ratio = 1, 3, 3,
feedback = 1,
smooth_option = 0,
zap_close_levels = 50,
interp_type = 1,
t_extrap_type = 2,
force_sfc_in_vinterp = 0,
use_levels_below_ground = .true.,
use_surface = .true.,
lagrange_order = 1,
sfcp_to_sfcp = .true.,
/

&physics
mp_physics = 2, 2, 10,
progn = 1, 1, 1
ra_lw_physics = 1, 1, 1,
ra_sw_physics = 2, 2, 2,
radt = 30, 30, 30,
sf_sfclay_physics = 2, 2, 2,
sf_surface_physics = 2, 2, 2,
bl_pbl_physics = 2, 2, 2,
bldt = 1, 1, 1,
cu_physics = 5, 5, 3,
cudt = 1, 1, 1,
sst_update = 0,
surface_input_source = 1,
num_soil_layers = 4,
sf_urban_physics = 1, 1, 0,
mp_zero_out = 2,
mp_zero_out_thresh = 1.e-8,
maxiens = 1,
maxens = 3,
maxens2 = 3,
maxens3 = 16,
ensdim = 144,
cu_rad_feedback = .false.,.false.,.true.,
cu_diag = 1, 1, 1,
isfflx = 1,
ifsnow = 1,
icloud = 1,
num_land_cat = 20,
/

&fdda
grid_fdda = 1, 1, 1,
gfdda_inname = 'wrffdda_d<domain>',
gfdda_interval_m = 360, 360, 360,
gfdda_end_h = 100000, 100000, 100000,
io_form_gfdda = 2,
fgdt = 0,
if_no_pbl_nudging_uv = 0, 0,0,
if_no_pbl_nudging_t = 1, 1, 0,
if_no_pbl_nudging_q = 1, 1, 0,
if_zfac_uv = 0, 0, 0,
if_zfac_t = 0, 0, 0,
if_zfac_q = 0, 0, 0,
guv = 0.0003, 0.0003, 0.0003,
gt = 0.0003, 0.0003 ,0.0003,
gq = 0.00001, 0.00001,0.0003,
if_ramping = 0,
dtramp_min = 60,
/

&dynamics
rk_ord = 3,
w_damping = 1,
diff_opt = 1,
km_opt = 4,
base_temp = 290.,
damp_opt = 0,
zdamp = 5000., 5000., 5000.,
dampcoef = 0.01, 0.01, 0.01,
diff_6th_opt = 0,
diff_6th_factor = 0.12,
diff_6th_factor = 0.12,
khdif = 0, 0, 0,
kvdif = 0, 0, 0,
non_hydrostatic = .true., .true., .true.,
moist_adv_opt = 2, 2, 2,
scalar_adv_opt = 2, 2, 2,
chem_adv_opt = 2, 2, 2,
tke_adv_opt = 2, 2, 2,
time_step_sound = 4, 4, 4,
h_mom_adv_order = 5, 5, 5,
v_mom_adv_order = 3, 3, 3,
h_sca_adv_order = 5, 5, 5,
v_sca_adv_order = 3, 3, 3,
/

&bdy_control
spec_bdy_width = 5,
spec_zone = 1,
relax_zone = 4,
specified = .true., .true.,.false.,
nested = .false., .false., .true.,
/

&grib2
/

&chem
kemit = 1,
kemit_aircraft = 0,
chem_opt = 11, 11, 9,
bioemdt = 0, 0, 1,
photdt = 30, 30, 10,
chemdt = 0, 0, 1,
io_style_emissions = 2,
emiss_opt = 3, 3, 3,
emiss_inpt_opt = 1, 1, 101,
aircraft_emiss_opt = 0, 0, 0,
chem_in_opt = 0, 0, 0,
phot_opt = 2, 2, 2,
gas_drydep_opt = 1, 1, 1,
aer_drydep_opt = 1, 1, 1,
bio_emiss_opt = 0, 0, 3,
gas_bc_opt = 1, 1, 1,
gas_ic_opt = 1, 1, 1,
aer_bc_opt = 1, 1, 1,
aer_ic_opt = 1, 1, 1,
gaschem_onoff = 1, 1, 1,
wetscav_onoff = 1, 1, 1,
cldchem_onoff = 1, 1, 1,
vertmix_onoff = 1, 1, 1,
biomass_burn_opt = 0, 0, 0,
plumerisefire_frq = 0, 0, 0,
dust_opt = 0,
seas_opt = 2,
aer_op_opt = 1, 1, 5,
opt_pars_out = 1,
aer_ra_feedback = 1, 1, 1,
scale_fire_emiss = .false., .false., .false.,
have_bcs_chem = .true., .true., .false.,
chemdiag = 1, 1, 1,
chem_conv_tr = 1, 1, 1,
ne_area =100,
/

&namelist_quilt
nio_tasks_per_group = 0,
nio_groups = 1,
/


and jobscript is (mywrf3.91.-nokpp.sh)

#!/bin/bash
#SBATCH --ntasks 32
#SBATCH --nodes 2
#SBATCH --time 01:00:00
#SBATCH --mail-type ALL

set -e
module purge; module load bluebear

module load apps/jasper
export JASPER=$JASPER_ROOT
export JASPERLIB=$JASPER_ROOT/lib
export JASPERINC=$JASPER_ROOT/include

### Jian: only 'iomkl' works for parallel run!!!
module load apps/iomkl/2017.01
module load apps/netcdf/v4.2.1.1_intel-2013.0.079

export NETCDF=$NETCDF_ROOT
export EM_CORE=1
export NMM_CORE=0
export WRF_CHEM=1

ulimit -s unlimited

#./ideal.exe
#mpirun -np 96 ./wrf.exe
#./real.exe
#mpirun -np 8 ./real.exe
#./wrf.exe
mpirun -np 32 ./wrf3.9.1-noKPP.exe


my tail rsl.error.0000 is

Timing for processing lateral boundary for domain 1: 1.50548 elapsed seconds
Tile Strategy is not specified. Assuming 1D-Y
WRF TILE 1 IS 1 IE 64 JS 1 JE 19
WRF NUMBER OF TILES = 1
D01 3-D analysis nudging reads new data at time = 0.000 min.
D01 3-D analysis nudging bracketing times = 0.00 360.00 min.
photolysis_driver: called for domain 1
Timing for Writing wrfout_d02_2011-12-01_00:00:00 for domain 2: 1.88863 elapsed seconds
mediation_integrate: med_read_wrf_chem_emissions: Read emissions for time 2011-12-01_00:00:00
mediation_integrate: med_read_wrf_chem_emissions: Open file wrfchemi_d02_2011-12-01_00:00:00
d02 2011-12-01_00:00:00 Input data processed for aux input 10 for domain 2
Tile Strategy is not specified. Assuming 1D-Y
WRF TILE 1 IS 1 IE 18 JS 1 JE 11
WRF NUMBER OF TILES = 1
D02 3-D analysis nudging reads new data at time = 0.000 min.
D02 3-D analysis nudging bracketing times = 0.00 360.00 min.
photolysis_driver: called for domain 2
Timing for main: time 2011-12-01_00:00:24 on domain 2: 3.75538 elapsed seconds
Timing for main: time 2011-12-01_00:00:48 on domain 2: 0.41202 elapsed seconds
Timing for main: time 2011-12-01_00:01:12 on domain 2: 0.50112 elapsed seconds
Timing for main: time 2011-12-01_00:01:12 on domain 1: 36.51680 elapsed seconds
Timing for main: time 2011-12-01_00:01:36 on domain 2: 0.53409 elapsed seconds
Timing for main: time 2011-12-01_00:02:00 on domain 2: 0.48982 elapsed seconds
Timing for main: time 2011-12-01_00:02:24 on domain 2: 0.50282 elapsed seconds
Timing for main: time 2011-12-01_00:02:24 on domain 1: 4.84786 elapsed seconds
Timing for main: time 2011-12-01_00:02:48 on domain 2: 0.53858 elapsed seconds
Timing for main: time 2011-12-01_00:03:12 on domain 2: 0.49145 elapsed seconds
Timing for main: time 2011-12-01_00:03:36 on domain 2: 0.50420 elapsed seconds
Timing for main: time 2011-12-01_00:03:36 on domain 1: 5.15013 elapsed seconds
Timing for main: time 2011-12-01_00:04:00 on domain 2: 0.53763 elapsed seconds
Timing for main: time 2011-12-01_00:04:24 on domain 2: 0.49445 elapsed seconds
Timing for main: time 2011-12-01_00:04:48 on domain 2: 0.50212 elapsed seconds
Timing for main: time 2011-12-01_00:04:48 on domain 1: 5.17132 elapsed seconds
Timing for main: time 2011-12-01_00:05:12 on domain 2: 0.53809 elapsed seconds
Timing for main: time 2011-12-01_00:05:36 on domain 2: 0.49610 elapsed seconds
Timing for main: time 2011-12-01_00:06:00 on domain 2: 0.50337 elapsed seconds
Timing for main: time 2011-12-01_00:06:00 on domain 1: 5.15969 elapsed seconds
Timing for main: time 2011-12-01_00:06:24 on domain 2: 0.54184 elapsed seconds
Timing for main: time 2011-12-01_00:06:48 on domain 2: 0.49752 elapsed seconds
Timing for main: time 2011-12-01_00:07:12 on domain 2: 0.51206 elapsed seconds
Timing for main: time 2011-12-01_00:07:12 on domain 1: 5.19401 elapsed seconds
Timing for main: time 2011-12-01_00:07:36 on domain 2: 0.53909 elapsed seconds
Timing for main: time 2011-12-01_00:08:00 on domain 2: 0.50182 elapsed seconds
Timing for main: time 2011-12-01_00:08:24 on domain 2: 0.50560 elapsed seconds
Timing for main: time 2011-12-01_00:08:24 on domain 1: 5.18936 elapsed seconds
Timing for main: time 2011-12-01_00:08:48 on domain 2: 0.54307 elapsed seconds
Timing for main: time 2011-12-01_00:09:12 on domain 2: 0.49556 elapsed seconds

slurm.out (job)

JasPer v1.900.1 built with Intel 2013_sp1.1.106
Intel compilers v2013 SP1 update 1 (sp1.1.106)
GCCcore/5.4.0
binutils/2.26-GCCcore-5.4.0
icc/2017.1.132
ifort/2017.1.132
iccifort/2017.1.132
numactl/2.0.11-iccifort-2017.1.132
hwloc/1.11.5-iccifort-2017.1.132
OpenMPI/2.0.1-iccifort-2017.1.132
iompi/2017.01
imkl/2017.1.132-iompi-2017.01
apps/iomkl/2017.01
NetCDF v4.2.1.1 built with intel 2013.0.079
starting wrf task 23 of 32
starting wrf task 24 of 32
starting wrf task 22 of 32
starting wrf task 20 of 32
starting wrf task 25 of 32
starting wrf task 28 of 32
starting wrf task 10 of 32
starting wrf task 11 of 32
starting wrf task 18 of 32
starting wrf task 27 of 32
starting wrf task 3 of 32
starting wrf task 6 of 32
starting wrf task 17 of 32
starting wrf task 31 of 32
starting wrf task 2 of 32
starting wrf task 4 of 32
starting wrf task 5 of 32
starting wrf task 7 of 32
starting wrf task 15 of 32
starting wrf task 16 of 32
starting wrf task 26 of 32
starting wrf task 0 of 32
starting wrf task 1 of 32
starting wrf task 8 of 32
starting wrf task 9 of 32
starting wrf task 12 of 32
starting wrf task 13 of 32
starting wrf task 14 of 32
starting wrf task 19 of 32
starting wrf task 30 of 32
starting wrf task 29 of 32
starting wrf task 21 of 32
~
-----------------------------------------------------

Please, if you could help me figure out the problem. I already changed my time_steps to 72 and ran this. same results. ncview of wrfout_d01 and wrfout_d02 only show variable at 0th hour.
komal



Message has been deleted

Rajmal Jat

unread,
Aug 11, 2018, 6:54:38 AM8/11/18
to wrf-chem-run, komalsh...@gmail.com

Dear Komal,

I am also coming across the similar problem. If you found solution please write in reply so that I would also get benifited.

Regards

Stacy Walters

unread,
Aug 11, 2018, 8:58:44 AM8/11/18
to Rajmal Jat, wrf-chem-run, Komal Shukla
Rajmal,

Please set the following namelist variables as follows:

frames_per_outfile = 1,1
debug_level           = 300

and try a short, say two hour simulation.

Stacy
Reply all
Reply to author
Forward
0 new messages