Xyce Parallel Zoltan Load Balancing Fails at Initialization

40 views
Skip to first unread message

John Mayega

unread,
Oct 3, 2023, 10:36:52 AM10/3/23
to xyce-users
Hello,
    I am using a netlist which simulates in Xyce serial with no issues.  However, when using Xyce parallel I get an error with the Zoltan Load Balancing and the simulations fails.  Are there any settings which could improve this behavior?

Thank You.

Sim Console Output:
----------------------------------------------------------------------------------------------
***** Setting up topology...

***** Device Count Summary ...
       C level 1 (Capacitor)                   1004
       D level 1,2 (Diode)                     1728
       I level 1 (Independent Current Source)     5
       M level 14 (BSIM4)                     64150
       R level 1 (Resistor)                    1162
       V level 1 (Independent Voltage Source)    95
       --------------------------------------------
       Total Devices                          68144
***** Setting up matrix structure...
***** Number of Unknowns = 33450
***** Initializing...

***** Beginning Transient Calculation...


Analyzed Singleton Problem:
---------------------------
Singletons Detected!
Num Singletons:      62
---------------------------


ConstructedSingleton Problem:
---------------------------
RatioOfDimensions:   0.998146
RatioOfNonzeros:     0.815433
---------------------------

ZOLTAN Load balancing method = 10 (HYPERGRAPH)
function OneStep::rejectStep:
   Maximum number of failures at time 0
*** Xyce Abort ***
function OneStep::rejectStep:
   Maximum number of failures at time 0

*** Xyce Abort ***
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------


xyce-users

unread,
Oct 3, 2023, 12:11:46 PM10/3/23
to xyce-users

The load balance worked, but the solvers are failing on the initial solve.  

This is probably happening because the iterative solver (which is the default for parallel runs above 10,000 unknowns) is not robust enough to solve this problem.  

The easiest way to fix this is to force Xyce to use the serial direct solver, KLU.    The device evaluations will still be parallel, but the solve will be serial.   For this size of problem (68k, roughly) that is usually the best choice anyway.  Also, you will be using exactly the same solver as was used for the serial calculation.

You can set this with the following option in the netlist:   .options linsol type=klu

thanks,
Eric
Reply all
Reply to author
Forward
0 new messages