Xyce 7.4 Parallel Regression Test Suite 1 FAILED[sh] & 1 ERROR[sh]

170 views
Skip to first unread message

Ricardo Cervantes

unread,
Dec 10, 2021, 9:11:01 PM12/10/21
to xyce-users

Hi Xyce team,

I just compile run the regression test on the latest version of Xyce 7.4 Parallel and I got the following issues:

CommandLine/command_line..........................FAILED[sh]      (Time:  13s =   2.67cs +   0vs)
SENS/bsim3Inv.....................................EXITED WITH ERROR[sh] (Time:   6s =   6.77cs +   0vs)

The Xyce 7.4 Serial shows no issues.

Sincerely,
Ricardo.

Happy to help in any capacity




xyce-users

unread,
Dec 12, 2021, 5:10:05 PM12/12/21
to xyce-users

These are probably not much to worry about, though at this point it's anybody's guess as to why they're failing.  On what system is this, and is it repeatable?

The console output of run_xyce_regression is generally useless for diagnosing things --- it's strictly a report of what passed and what failed.  To understand *why* a failure is happening you have to dig through the output files that are placed in the target directories.  If you've run run_xyce_regression the way we suggest in the Running the Test Suite page, then you'll find that information in files in the  Xyce_Test/Netlists/CommandLine and Xyce_Test/Netlists/SENS subdirectories.  Those will say what actually happened.

CommandLine simply checks some of the outputs of things like "Xyce -v", "Xyce -capabilities", "Xyce -h" and so forth.  If, for some reason, your build is outputting extra text for some of those outputs (e.g. system warnings from openmpi or some such), then when the test checks the output against its gold standard it gets confused.  This is usually not a sign of real problems.
m
SENS/bsim3Inv crashing is a little more concerning if it is repeatable, and would probably be worth understanding.  You'd need to look through the files "bsim3Inv*out" and "bsim3Inv*err" in the Xyce_Test/Netlists/SENS directory and see which ones show error messages.  The "EXITED WITH ERROR" message usually means Xyce didn't complete properly, and those output files *should* contain some evidence of why.

Ricardo Cervantes

unread,
Dec 14, 2021, 5:21:17 AM12/14/21
to xyce-users
Xyce team,

Here is the files showing the error:
bsim3Inv_oldFD.cir.out
bsim3Inv_oldFD.cir.err

/****************************** bsim3Inv_oldFD.cir.out ******************************/
[1639463005.280847] [alienix:137586:0]    ucp_context.c:1533 UCX  WARN  UCP version is incompatible, required: 1.10, actual: 1.9 (release 0 /lib64/libucp.so.0)
[1639463005.284594] [alienix:137589:0]    ucp_context.c:1533 UCX  WARN  UCP version is incompatible, required: 1.10, actual: 1.9 (release 0 /lib64/libucp.so.0)
[1639463005.285896] [alienix:137588:0]    ucp_context.c:1533 UCX  WARN  UCP version is incompatible, required: 1.10, actual: 1.9 (release 0 /lib64/libucp.so.0)
[1639463005.290768] [alienix:137587:0]    ucp_context.c:1533 UCX  WARN  UCP version is incompatible, required: 1.10, actual: 1.9 (release 0 /lib64/libucp.so.0)

*****
***** Welcome to the Xyce(TM) Parallel Electronic Simulator
*****
***** This is version Xyce Release 7.4-opensource
***** Date: Tue Dec 14 01:23:25 EST 2021


***** Executing netlist bsim3Inv_oldFD.cir

***** Reading and parsing netlist...
***** Setting up topology...

***** Device Count Summary ...
       C level 1 (Capacitor)                  1
       M level 9 (BSIM3)                      4
       R level 1 (Resistor)                   2
       V level 1 (Independent Voltage Source) 2
       ----------------------------------------
       Total Devices                          9
***** Setting up matrix structure...
***** Number of Unknowns = 8
***** Initializing...

Netlist warning: User set FORCEFD=true, so numerical derivatives will be used
 for sensitivity calculations.  This also means that the full linear system
 must be evaluated at every step.  The default behavior is to only re-evaluate
 the nonlinear portions of the problem for efficiency.  The simulation may be
 slower as a result.
***** Beginning DC Operating Point Calculation...

***** Beginning Transient Calculation...

function OneStep::rejectStep:
   Maximum number of local error test failures.  
*** Xyce Abort ***
/****************************** bsim3Inv_oldFD.cir.out ******************************/


/****************************** bsim3Inv_oldFD.cir.err ******************************/
function OneStep::rejectStep:
   Maximum number of local error test failures.  


*** Xyce Abort ***
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
/****************************** bsim3Inv_oldFD.cir.err ******************************/

Sincerely,
Ricardo.

xyce-users

unread,
Dec 14, 2021, 11:03:20 AM12/14/21
to xyce-users
Are you attempting to run the test suite in parallel with more than 2 processors?   This test is known to fail when run with more than 2 processors.

Our web site describing the test suite states:

> Also note that the Xyce regression test suite is not generally suitable for running on large numbers of processors, and contains many circuits that are extremely small. Some are so small that they don't work at all in parallel (which is one of the
> reasons that we have the "serial" and "parallel" tags system). In addition some tests are just barely large enough to work properly on two processors. So, we suggest running the parallel build over the test suite on two processors, which is how we do > our own daily testing of the code.

If you really want to run on 4 or 8 processors, you must add additional tags to the taglist produced by "suggestXyceTagList.sh".  If you're running more than 2 processors, add the tag "-notmorethan2procs" and if you're running more than 4 processors, add the tag "-notmorethan4procs".  This will cause run_xyce_regression to skip the tests that are known to work only on smaller numbers of processors.  So, for example, if for some reason you really wanted to run on 8 processors:

XYCE_BINARY="`pwd`/src/Xyce"
EXECSTRING="mpirun -np 8 $XYCE_BINARY"
eval `$HOME/Xyce_Regression-7.4/TestScripts/suggestXyceTagList.sh "$XYCE_BINARY"`
$HOME/Xyce_Regression-7.4/TestScripts/run_xyce_regression \
--output=`pwd`/Xyce_Test --xyce_test="$HOME/Xyce_Regression-7.4" \
--taglist="${TAGLIST}-notmorethan4procs-notmorethan2procs" \
--resultfile=`pwd`/parallel_results \
"${EXECSTRING}"


There is no real benefit to running the test suite on more than 2 processors.  The tests are so small that running them on more processors usually makes the test suite slower because there is too little work to spread around and too much communication added by spreading it around.


These system warnings are why the CommandLine test is failing.  That extra text is confusing the test script that examines the standard output (most tests do not even scan the standard output, and look only at the simulation results):
> [1639463005.280847] [alienix:137586:0]    ucp_context.c:1533 UCX  WARN  UCP version is incompatible, required: 1.10, actual: 1.9 (release 0 /lib64/libucp.so.0)
> [1639463005.284594] [alienix:137589:0]    ucp_context.c:1533 UCX  WARN  UCP version is incompatible, required: 1.10, actual: 1.9 (release 0 /lib64/libucp.so.0)
> [1639463005.285896] [alienix:137588:0]    ucp_context.c:1533 UCX  WARN  UCP version is incompatible, required: 1.10, actual: 1.9 (release 0 /lib64/libucp.so.0)
> [1639463005.290768] [alienix:137587:0]    ucp_context.c:1533 UCX  WARN  UCP version is incompatible, required: 1.10, actual: 1.9 (release 0 /lib64/libucp.so.0)

Ricardo Cervantes

unread,
Dec 14, 2021, 7:12:01 PM12/14/21
to xyce-users
I see I should have double check the guide, apologies

I runned again it with "np 2"
bsim3Inv_oldFD.cir.err is OK now.
Other still showing error: CommandLine/command_line

CommandLine/command_line..........................FAILED[sh]      (Time:  12s =   1.15cs +   0vs)

$cat junk.err

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

$cat syntax.err
Simulation aborted due to error.  There are 0 MSG_FATAL errors and 0 MSG_ERROR
 errors



*** Xyce Abort ***
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

I guess we are now OK?

Cheers,
Ricardo.

xyce-users

unread,
Dec 14, 2021, 7:25:10 PM12/14/21
to xyce-users
Yes, you're good.

The syntax.err and junk.err are a red herring --- those are *supposed* to be errors.

CommandLine runs Xyce in multiple ways.  First it runs "Xyce -v", "Xyce -capabilities", and "Xyce -license" and checks that it doesn't crash (without looking at anything else).  Then it runs "Xyce -syntax" over a good netlist and makes sure it doesn't fail.  Then it runs "Xyce -syntax" over a BAD netlist and makes sure it DOES fail.  It dumps the output of both of those "-syntax" checks into the same file, with the second clobbering the output of the first (which is why you see errors).  It does a bunch more tests, too, including passing "-junk" to Xyce and expecting it to fail (which is why junk.err has errors in it).  And most importantly, it runs "Xyce -h" and compares the help output to a gold standard.  This is what's actually failing.

The reason your test is failing is that there is something amiss in your OpenMPI install that is throwing warnings about "UCP"  that we're not expecting to see in that gold standard output for "-h".  That is not a Xyce issue, it's something up with your system and your MPI install.  Since it's just a warning and everything else is working, you can safely ignore this failure.
Reply all
Reply to author
Forward
0 new messages