Question about long startup time

24 views
Skip to first unread message

Chuyang Liu

unread,
Apr 16, 2026, 2:07:13 PMApr 16
to Amanzi-ATS Users
Hi there,

I have an Exodus mesh file of about 30 MB that contains roughly 20,000 side sets and labeled sets in total. My goal is to use these regions to track different variables in the final transient run. However, for the steady-state and cyclic steady-state runs, I did not include these ~20K regions in the XML under <ParameterList name="regions" type="ParameterList"> or in <ParameterList name="observations" type="ParameterList">. Even so, the steady-state run on NERSC still took about 2 hours before the simulation actually started, appearing to stall around the mesh creation / partitioning stage at MESH_PartitionWithZoltan: Using partitioning method RCB for ZOLTAN. I would really appreciate any suggestions on what might be causing this startup cost and whether there are ways to speed it up.

Also, in steady-state runs, is there a good way to identify which cells or regions are causing very small time steps, and what are the best practices for diagnosing and fixing that?

Best,
Chuyang

Coon, Ethan

unread,
Apr 16, 2026, 2:59:44 PMApr 16
to Chuyang Liu, Amanzi-ATS Users
Yes, this is probably a code structural problem.  Labeled Sets in the mesh are read on rank 0, then created on all ranks — many will be empty on most ranks (maybe even all but one).  But every labeled set construction is probably sending multiple MPI_Bcast or other related messages per labeled set.

This is something we would like to clean up at some point, but is deep in our code stack.

Ethan

From: ats-...@googlegroups.com <ats-...@googlegroups.com> on behalf of Chuyang Liu <CL...@lbl.gov>
Date: Thursday, April 16, 2026 at 12:07 PM
To: Amanzi-ATS Users <ats-...@googlegroups.com>
Subject: [EXTERNAL] Question about long startup time

This Message Is From an External Sender
This email was sent from a non-ORNL address. If suspicious, use the Report Phish button in Outlook.
 
--
You received this message because you are subscribed to the Google Groups "Amanzi-ATS Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-users+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ats-users/00e19e93-c47a-4436-a4c4-e3cee6ffa021n%40googlegroups.com.

Chuyang Liu

unread,
Apr 16, 2026, 3:15:21 PMApr 16
to Amanzi-ATS Users
Thanks, Ethan — that is very helpful.

Would it be reasonable to use a simplified mesh without these ~20K extra labeled sets and side sets for the spin-up runs, and then switch to the full mesh only for the final transient run? I am wondering whether that would be a practical way to avoid the long startup cost.

Also, for steady-state runs, is there a recommended way to identify which cells or regions are causing very small time steps, and what are the best practices for diagnosing and fixing that?

Chuyang

Coon, Ethan

unread,
Apr 16, 2026, 3:54:22 PMApr 16
to Chuyang Liu, Amanzi-ATS Users
Would it be reasonable to use a simplified mesh…

Yes, that sounds like a good idea to me.  It won’t fix the problem when you go to the transient run, but that will be just the last run or two so should be better.


Also, for steady-state runs, is there a recommended way to identify which cells or regions are causing very small time steps,…

Yes, you’ll want to use debug cells.  This gets covered pretty carefully in the debugging movie here:  https://www.youtube.com/watch?v=pY5yQga7z_o&list=PLisa2eqmVBFZ1mpoYqNmtFUy7yR9RaFks&index=6

This gives you the cell that is causing the problem, and also the porosity, permeability, etc.  It does not tell you the region, but it does tell you the coordinates, so you can figure out which cell in Watershed Workflow is causing the trouble by minimizing the distance to that coordinate, from which you can back out the region number.

Ethan



Chuyang Liu

unread,
Apr 16, 2026, 4:26:19 PMApr 16
to Amanzi-ATS Users
Thanks, Ethan! I’ll try the simplified mesh for spin-up and use debug cells to track down the problematic cells.
Chuyang
Reply all
Reply to author
Forward
0 new messages