non-conservation after UMIST update

21 views
Skip to first unread message

Gary Ferland

unread,
Nov 6, 2021, 8:23:17 AM11/6/21
to cloudy-dev
I reintegrated Gargi Shaw's updates to the UMIST chemistry and we now have a failure in tsuite / programs / collcool.cpp.  This input deck reproduces the problem in a single model very quickly.  There must be a hot reaction in the new chemistry.  I am working with Gargi to try to understand what happened.

the script is
========
coronal 4.0
stop zone 1
set dr 0
hden 0
set dr 0
element helium abundance -9
metals -9
element Hydrogen abundance 5.000000
set eden 0
===========
the error is
PROBLEM non-conservation of nuclei F nzone 0 atoms 0.000000000000e+00 moles 3.021015664183e-22 sum 3.021015664183e-22 tot gas 3.019999863772e-22 rel err 3.364e-04
atos cannot load symbols for the file ExecName-not-set for architecture x86_64.

 PROBLEM DISASTER
 An assert has been thrown, this is bad.
 Failed: lgElemsConserved()
 It happened in the file mole_species.cpp at line number 975
 This is iteration 1, nzone 0, fzone 0.00, lgSearch=T.
=================

The abundances are tiny and the error small.  I wonder if the criteria are too strict for a species with small abundances?  it should not be hard to identify the reaction that is driving this.

thanks
Gary

Robin Williams

unread,
Nov 6, 2021, 2:31:29 PM11/6/21
to cloud...@googlegroups.com
A relative error of 3e-4 is enough to be suspicious.

It looks like the diagnostic is still enabled when compiled in debug which checks that the reactions conserve everything, so it looks like the problem is algorithmic rather than data.

"atoms 0.000000000000e+00" suggests to me that the solver has gone below a "too low to care" heuristic and clamped negative outputs, rather than taking the time to converge them to sensible values.

OK, it runs through fine with the following change to the source:

--- a/source/mole_solve.cpp
+++ b/source/mole_solve.cpp
@@ -140,7 +140,7 @@ double mole_solve()
                                                        newMolsTmp = abs(newmols[mol]);
                                                        iworst = mol;
                                                }
-                                               if( newmols[mol] < -SMALLABUND)
+                                               if( newmols[mol] < 0)//-SMALLABUND)
                                                {
                                                        ++nBad;
                                                }

  Robin



--
--
http://groups.google.com/group/cloudy-dev
---
You received this message because you are subscribed to the Google Groups "cloudy-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloudy-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloudy-dev/e2a5109d-56de-42a7-a9df-3b5c5c5f00e9n%40googlegroups.com.

Robin Williams

unread,
Nov 6, 2021, 2:38:37 PM11/6/21
to cloud...@googlegroups.com
P.S. while it's probably worth seeing what happens with the test suite with the threshold set to zero here, it's may need to be something a little more subtle (e.g. scaled relative to the total abundance of the least abundant element in the molecule).

Gary J. Ferland

unread,
Nov 6, 2021, 2:42:58 PM11/6/21
to cloud...@googlegroups.com
wow!  Thanks!  As ever, a miracle

I will try the test suite now with the change.
best,
Gary



--
Gary J. Ferland
Physics, Univ of Kentucky
Lexington KY 40506 USA
Tel: 859 257-8795
https://pa.as.uky.edu/users/gary

Gary J. Ferland

unread,
Nov 6, 2021, 4:20:50 PM11/6/21
to cloud...@googlegroups.com
test suite ran fine in auto and slow, now checking programs, which is where the crash occurred

Gary J. Ferland

unread,
Nov 6, 2021, 10:13:51 PM11/6/21
to cloud...@googlegroups.com
everything I tried passed - tsuite auto, slow, & programs

do you recommend posting?

Gary J. Ferland

unread,
Nov 6, 2021, 10:26:36 PM11/6/21
to cloud...@googlegroups.com
I am now running with FLT_IS_DBL

Marios improved the makefile documentation at https://gitlab.nublado.org/cloudy/cloudy/-/wikis/MakefileDescription 
a real time saver.  thanks.

Robin Williams

unread,
Nov 7, 2021, 6:45:39 AM11/7/21
to cloud...@googlegroups.com
I think the change I suggested will generally improve the quality of the solution, unless the code gets trapped bashing its head against a numerical wall.  If everything works OK in practice, it's worth making the change.

This led me to wonder again about other cases of negative populations.  I tried running the problem from the Q&A with atom_leveln generating -ve values.  Printing the matrix, it looks like levels 8 and 11 in "Ca 1" are essentially decoupled from the rest of the atom when this crashes:
DEBUG CADIAG       522.92974       1108.3665       3735.6191       1197.9489       5700.5829       4737.2109        4523.364   8.3498699e-14   2.1800017e+08        86003501   6.2165053e-13        52343413        52903316        10400641        53003219
DEBUG CAPOPS    0.0017379473   1.2833655e-07   3.7452746e-07   5.8998924e-07  -8.5090887e-05  -0.00014084687  -0.00019502307  -1.1175061e-05   4.1644486e-09   1.4816506e-11  -2.4985778e-07  -9.2374405e-09  -1.2343215e-08  -8.7321066e-09  -1.5227635e-08
 -- CADIAG is the diagonal of amat[][], without the conservation constraint, CAPOPS the predicted populations

Using either
  species "Ca" levels=7
  species "Ca" levels=all
the crash goes away; adding in bogus coupling terms to the higher levels also prevents the crash.  With levels=all, at the same stage in the solution as the default case crashes,
DEBUG CADIAG       522.93494       1503.8349       4140.3657       1611.3346       7673.1449       6683.0953       6478.9437       996.96366   2.1800425e+08        86003501   1.8313503e-12        52343413        52903316        10400641        53003219        22000926        22600954 etc...
DEBUG CAPOPS    0.0013013374   9.6093542e-08   2.8043806e-07   4.4181226e-07   7.6715361e-08   1.2697692e-07   1.7575469e-07   5.7120155e-08    3.118249e-09   1.1094825e-11   1.6425583e-07   8.3281159e-12   1.1127265e-11   7.8725493e-12   1.3723136e-11   1.0765004e-12    etc...
the coupling of level 11 still looks dubiously low, but 8 looks more sensible.  Most of the low-lying levels between this and the ground state have significant changes in their diagonal coefficients.  The correct populations are low enough that these dodgy rates may not have a major effect on the model output, but they don't leave you feeling particularly happy either.

So yes, it would be good if the numerical methods coped with the initial setup -- which is ill-conditioned, but looks like it should have a realizable solution.

But it also looks like the truncated solvers are being asked to solve rather unphysical models.  It would seem better if level 8 in the default model were coupled to *something* with a realistic sink rate (either the continuum, or an aggregate state), rather than being left to blow in the wind.

   Robin

Gary J. Ferland

unread,
Nov 7, 2021, 8:05:18 PM11/7/21
to cloud...@googlegroups.com
thanks, Robin, for looking at Ca I.  

I had a look at the atomic data.  There are few real data.  NIST has energy levels but there are no collision strengths - so this is a "baseline" model where we used g-bar for electron CS.  Even the As are very limited - only E1 mostly connecting higher levels.  There are few transitions within the lowest fifteen levels, our default size.  

There are other problems.  Ca I has a very low ionization potential and so will only be present where the gas is cool since ionizing radiation must be blocked.  To do a proper model we would need neutral collisions at low temperatures.  Currently, we only monitor one of its lines in blr_n13_p18_Z20.in "Ca 1"  4226.73.  That is the 1-9 transition.

So I want to propose to not do this species, that is, to comment it out in our Stout masterlist.  

thanks,
Gary

Robin Williams

unread,
Nov 8, 2021, 2:52:30 PM11/8/21
to cloud...@googlegroups.com
I was looking at it as an example of a numerical problem, although if the data quality is that poor I guess leaving it enabled is a bit of an elephant trap for other reasons.  


look like they might potentially be of use for additional data & references?

This looks like it's been an issue for a while -- I noticed this chunk of code in atom_leveln.cpp which appears to be a previous attempt to mitigate (or at least bury) a similar problem, and may no longer be required if Ca I is switched out

// >>chng 20 jul 28 from 1e-10 to 1e-7 to pass Morriset PN sims which failed with
// nearly all Ca atomic, Ca 6 - 9 about -1e-10
double poplimit = 1.0e-10;
/* identifies special case where nearly all atomic */
if( pops[0]/abund > 0.99 )
poplimit = 1.0e-7;

  Robin


Gary J. Ferland

unread,
Nov 8, 2021, 3:22:01 PM11/8/21
to cloud...@googlegroups.com
this is not the only problem her sim has - the head of master does not make it to the Ca I crash - insanity in the 2-electron ionization potential setup.  

we have nothing in the test suite that increases the resolution of the continuum and leaves all the elements included.  the crash is straightforward once those two criteria are met.  I will change one of the test cases to do high resolution and all elements.

so, a lot of negative progress in the past week!


Robin Williams

unread,
Nov 8, 2021, 5:47:56 PM11/8/21
to cloud...@googlegroups.com
I figured out a way of asking the question of what to do here that (finally!) found some useful literature:

You can look at this problem as trying to find the equilibrium state of a continuous-time Markov chain; the discrete-time algorithms described here will need to be adapted to continuous time, and may be slower than LU, but may be a useful fallback when things break down due to nearly-singular matrices.

  Robin

Robin Williams

unread,
Nov 9, 2021, 2:09:12 AM11/9/21
to cloud...@googlegroups.com
To clarify how this relates to the matrix system Ax = b in atom_leveln.cpp, this is for the case b = 0, and is in effect solving for the equilibrium of the time-dependent system x = ( I - A dt) x [where the conservation sum corresponds to the Markov property of (I-A dt)].

https://arxiv.org/pdf/2101.11657.pdf sets out the algorithm in a way which seems a bit clearer to implement.  The way this is set out, I don't think it accesses the diagonal of the matrix, so the dt factors in the off-diagonal elements will cancel and it can be applied directly.

The case b != 0 can easily be moved into an equivalent form by adding an extra row to the matrix for "the environment" -- actually, I think it might be possible to look at the algorithm as combining the top levels into a macro-state until there's only one state which encompasses everything, so the answer is obvious, and then working backwards to pull the resolved levels back out of the macro-state one by one.

  Robin

Robin Williams

unread,
Nov 9, 2021, 5:11:45 PM11/9/21
to cloud...@googlegroups.com
I put together an implementation of the GTH algorithm, which seems to work at least for the simple tests I've thrown at it so far.  It looks like it might be a bit more expensive than LU, but not much.

Is it OK to use your new branch to explore this numerical approach?  I'd do the usual thing of putting flags in the code so I can swap approaches without causing disruption.

  Robin

Gary J. Ferland

unread,
Nov 9, 2021, 6:51:52 PM11/9/21
to cloud...@googlegroups.com
the test suite does not now pass one of the programs - we should not leave it as is.  We could post your fix from a few days ago straight to master, which does violate how we had set things up.  The fix has been tested every way I know of - GCC on linux, llvm on mac, and float as double.  

The existing branch cannot be merged to master without removing the three files I added to demonstrate the problems.  marios created a script to remove files from the git memory.

We do want to push forward and take your suggestions very seriously.  maybe the simplest would be to remove the extra debugging files reintegrate what we have, then keep going??

Robin Williams

unread,
Nov 10, 2021, 2:48:48 AM11/10/21
to cloud...@googlegroups.com
If you're not going to commit direct to the master, I think the simplest approach is probably to generate an additional branch, cherry-pick just the fix on to that (not a big job to do by hand for this case!) and merge that into the master.

Git is designed to make this kind of operation cheap in terms of repository space, etc. (hence e.g. the "git cherry-pick" command).

This may seem a bit admin-heavy, but a formal pull request provides somewhere to put explanations which aren't that obvious in a commit log (and discussion which wouldn't appear at all).  Keeping merges tightly focused is also useful compared to all-or-nothing blobs if some problem arises in the future.

  Robin

Robin Williams

unread,
Nov 10, 2021, 3:30:45 AM11/10/21
to cloud...@googlegroups.com
P.S. I bolted my modified code into my checked out version.  It seems to get reasonably sensible answers (agrees with the old implementation the first time through, stays positive when the matrix gets nastier).  "ca1.in" runs to completion when I use the values from the new solver for the populations.  I'd enabled it just for Ca 1 with no external sinks & sources, it's worth doing more extensive cross-testing.

Robin Williams

unread,
Nov 11, 2021, 3:31:01 AM11/11/21
to cloud...@googlegroups.com
For the case which fails, here's a comparison between the old and new approaches, with the associated error (A x - b)/diag(A) for each level sum.

This seems to confirm that the results of both schemes are accurate as far as the matrix system can tell, but the GTH scheme's answer is positive everywhere and significantly different from just zeroing the -ve elements & rescaling.

DEBUG Ca 1:   0; old    0.0012909787 e=  -7.209623e-20; new    0.0011333871 e=  1.8966657e-19
DEBUG Ca 1:   1; old   9.5330706e-08 e=  1.6819235e-23; new   8.3693551e-08 e= -4.3193543e-23
DEBUG Ca 1:   2; old   2.7820576e-07 e=  2.3129905e-23; new   2.4424478e-07 e=  5.3239793e-24
DEBUG Ca 1:   3; old   4.3825466e-07 e= -1.0888271e-22; new   3.8475629e-07 e=  -2.865043e-23
DEBUG Ca 1:   4; old  -1.7437251e-06 e=  2.2823005e-22; new   2.7831359e-05 e=   6.086079e-22
DEBUG Ca 1:   5; old  -2.8863045e-06 e= -3.4330037e-23; new   4.6067915e-05 e=  2.9295214e-21
DEBUG Ca 1:   6; old  -3.9965105e-06 e= -2.3010177e-21; new   6.3787762e-05 e= -1.2272095e-20
DEBUG Ca 1:   7; old   2.0098328e-05 e= -3.0827211e-20; new   3.1211825e-05 e=     6.0074e-22
DEBUG Ca 1:   8; old   3.0934276e-09 e=  5.0917522e-25; new   2.7158085e-09 e= -1.5743982e-28
DEBUG Ca 1:   9; old   1.1005969e-11 e= -2.5598614e-28; new   9.6624543e-12 e= -3.9907387e-28
DEBUG Ca 1:  10; old   4.4853241e-07 e=  1.1961099e-22; new   6.9671627e-07 e= -3.9428692e-22
DEBUG Ca 1:  11; old  -1.8929826e-10 e=  3.3141299e-26; new   3.0213638e-09 e= -5.3026078e-25
DEBUG Ca 1:  12; old  -2.5294335e-10 e= -3.2790384e-26; new   4.0371945e-09 e= -5.2464823e-25
DEBUG Ca 1:  13; old   -1.789427e-10 e= -6.2546107e-26; new   2.8560802e-09 e=   3.335775e-25
DEBUG Ca 1:  14; old  -3.1205231e-10 e=  6.5457288e-26; new   4.9806247e-09 e=              0

  Robin

Robin Williams

unread,
Nov 13, 2021, 4:34:09 AM11/13/21
to cloud...@googlegroups.com
I posted my revised code to the branch.  There are logical flags -- set false unless modified at compile time -- which either 
 a) compare the old solver with the new one without changing anything (as above), or
 b) enable the GTH solver to replace the standard one for Ca 1 with no sources
The second version allows ca1.in to run through (*).  It would be useful to extend the scope of this and test it more thoroughly.

(*) Except the top of the branch now fails on the smoke test:

 /home/rjrw/cloudy/cloudy/data/stout/o/o_1/o_1.coll:27:15: PROBLEM ERROR: duplicate collisional transition found
  CS ELECTRON 1 7 6.03E-02 1.09E-01 1.22E-01 1.37E-01 1.56E-01 1.94E-01 3.28E-01
              ^
 [Stop in errorAbort at parser.cpp:1472, something went wrong]

  Robin

P.S. An analysis like that presented here https://people.maths.ox.ac.uk/trefethen/publication/PDF/1992_54.pdf might be informative.  I think the idea is essentially perturbing the matrix and seeing how much the solutions move around, to give an idea of the error bars on the answer.  IIRC Ryan did something similar to look at sensitivities to atomic data.

Robin Williams

unread,
Nov 14, 2021, 4:45:10 PM11/14/21
to cloud...@googlegroups.com
I noticed the comment in the source about Christophe Morrisset's problem Ca 1 calculations from July 2020.

I just checked, these all run to completion with the modified code with the broadened poplimit value removed.

  Robin

Gary Ferland

unread,
Nov 14, 2021, 5:24:10 PM11/14/21
to cloudy-dev
Very impressive - a great update to the code!  Thanks for sorting this.
Gary
Reply all
Reply to author
Forward
0 new messages