molecular network bug in cooling sims

37 views
Skip to first unread message

Marios Chatzikos

unread,
Jan 7, 2021, 3:29:58 PM1/7/21
to cloudy-dev
Hello,

I have come across a bug in the molecular network in the context of cooling
calculations.  The following sim:

coronal 3.4e7 K init time
cmb
hden 5.88e-2 linear
init "honly.ini"
set dr 0
set nend 1
stop zone 1
iterate 600
stop time when temperature falls below 0.5 K
cosmic rays background
# no molecules
# set dynamics population equilibrium

fails with a non-conservation of H nuclei:

PROBLEM non-conservation of nuclei H     nzone 0 atoms
5.879937296098e-02 moles 2.732783132850e-08 sum 5.879940028881e-02 tot
gas 5.880000000000e-02 rel err -1.020e-05

but runs to completion, if the 'no molecules' command is activated.  Notice
that this is a H-only sim, the simplest it can be.

It seems that I am running up against a bug we've seen before. The usual
treatment has been to relax the relative error, but this is the wrong thing
to do -- after all, it has been relaxed from 1e-9 (if memory serves), to
1e-5.

Running a steady-state sim at the last recorded temperature of about 10,000K
runs gracefully to completion after 600 iterations.  In light of that,
the last
command of the deck was activated to force the molecules to their
equilibrium
state. However, Cloudy does not force the molecular network into
equilibrium.
If there is a special reason for that, do let me know.  Assuming this is an
oversight, the attached patch effects the desired change, which keeps
the code
parallel to what is done in ion_solver.cpp:815-816, and
iso_level.cpp:355-358.
The change does prevent the crash at the prescribed temperature, even though
the simulation fails later on for a different reason, which I'm now
looking into.
As to why it worked, that is because the first change in mole_solve.cpp now
activates use of conservation constraints in the matrix that is to be
inverted,
the usual case for steady-state models.

Unless I hear any strong objections, I will commit these changes (to branch
newdyna), and press on.

Marios

dyn.lgEq.patch

Francisco Guzman Fulgencio

unread,
Jan 7, 2021, 4:16:22 PM1/7/21
to cloudy-dev
Hi Marios,

I thought that the way to act was to require a feedback in git? Has this changed?

Also, it is normal that cosmic rays heating allows the gas to cool under 0.5K? I did not know that this was possible.



==========================
Francisco Guzmán
Assistant Professor of Physics
Rogers Hall, 106B,
Dahlonega Campus
University of North Georgia

From: cloud...@googlegroups.com <cloud...@googlegroups.com> on behalf of Marios Chatzikos <mchat...@gmail.com>
Sent: Thursday, January 7, 2021 3:29 PM
To: cloudy-dev <cloud...@googlegroups.com>
Subject: [cloudy-dev] molecular network bug in cooling sims
 
--
--
https://urldefense.proofpoint.com/v2/url?u=http-3A__groups.google.com_group_cloudy-2Ddev&d=DwIFaQ&c=FbBevciwIvGuzsJQdDnze9uCWRSXekJosRCbxNiCfPE&r=Mkp7lpq_r49Qwz5BQyqIyq_ECmUh2XDIy8lAGjYchIpU1ucc79OvFkEr1y5-Q-fc&m=qvFbdTmyonGu2-xQkYGyPSgctCgeCwdZp_cFwIKHmU4&s=I474d1esWjMDgXxvQIX7yI5htw0HsIR44rODfQviFws&e=
---
You received this message because you are subscribed to the Google Groups "cloudy-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloudy-dev+...@googlegroups.com.
To view this discussion on the web visit https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_msgid_cloudy-2Ddev_9afac56f-2D1bab-2Dd991-2D32d4-2Ddcf0ac630f4e-2540gmail.com&d=DwIFaQ&c=FbBevciwIvGuzsJQdDnze9uCWRSXekJosRCbxNiCfPE&r=Mkp7lpq_r49Qwz5BQyqIyq_ECmUh2XDIy8lAGjYchIpU1ucc79OvFkEr1y5-Q-fc&m=qvFbdTmyonGu2-xQkYGyPSgctCgeCwdZp_cFwIKHmU4&s=bSMscV0eIgbfPhTsF1MEP9br63kxQL3n5rp1WiFzeSY&e= .

Robin Williams

unread,
Jan 7, 2021, 4:40:38 PM1/7/21
to cloud...@googlegroups.com
Hi Marios --

I don't have time to study your patch just at the moment.  But it seems possible that it's worth sharing some of the background.

The reason for this check is to try to prevent the kind of issues which affected the molecular network in the days before newmole -- i.e. gross non-conservation due to unbalanced reactions introduced when the parsing scheme seemed too complex to bother with, so reactions were hacked in by other means.  IIRC, the text parser catches unbalanced reactions upstream of this, although the code is complex enough that there's always the possibility of bugs en route.

It's possible that you could check that the drift was small enough to be believable as rounding error and scale back, but a) you need a good model for what level of rounding error to expect; b) where do you put the error?  In the atoms/iso-nuclear molecules -- which might be a tiny fraction in some regions and hence liable to become dominated by the correction, or even pushed negative -- or spread across the bearing species -- but how, if they have more than one variety of nucleon?

In practice, this failure mode has often been the result of the code taking too many molecular solution steps as a result of poor convergence in other parts of the species balance coding which it's better to address directly.

Best wishes
  Robin


---
You received this message because you are subscribed to the Google Groups "cloudy-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloudy-dev+...@googlegroups.com.

Gary J. Ferland

unread,
Jan 7, 2021, 6:07:30 PM1/7/21
to cloud...@googlegroups.com
Hi Fran,


On Thu, Jan 7, 2021 at 4:16 PM Francisco Guzman Fulgencio <Francisco.Gu...@ung.edu> wrote:
Hi Marios,

I thought that the way to act was to require a feedback in git? Has this changed?

that would be for an actual commit that has been staged.  Using this forum is OK with me.  I wonder if GitLab sends a notice when something is staged or there has been a response?  it is helpful to get emails like like this.

Also, it is normal that cosmic rays heating allows the gas to cool under 0.5K? I did not know that this was possible.

so we know what temperature the sim was at when it aborted?  It may not have made it down to 3 K.

maybe it would be helpful to talk about this sometime Friday?  Maybe after the astro journal club?

thanks,
Gary

---
You received this message because you are subscribed to the Google Groups "cloudy-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloudy-dev+...@googlegroups.com.


--
Gary J. Ferland
Physics, Univ of Kentucky
Lexington KY 40506 USA
Tel: 859 257-8795
https://pa.as.uky.edu/users/gary

Marios Chatzikos

unread,
Jan 7, 2021, 8:02:10 PM1/7/21
to 'Robin Williams' via cloudy-dev

To clarify, when I said that the presence or not of cosmic rays did not affect the behavior of the run, I was referring to the original sim, not the one forced to equilibrium.


For the latter, with the patch applied, the sim:

- without CR fails (ill-conditioned matrix at 350K, b/c all reactions involving H have virtually 0 rates);

- with CR succeeds (runs for 600 sims).


I hope this clears things up a bit.


Thanks,

Marios


On Thu, Jan 7, 2021 at 4:16 PM Francisco Guzman Fulgencio <Francisco.Gu...@ung.edu> wrote:

---
You received this message because you are subscribed to the Google Groups "cloudy-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloudy-dev+...@googlegroups.com.

Marios Chatzikos

unread,
Jan 7, 2021, 8:26:00 PM1/7/21
to cloud...@googlegroups.com

Hi Gary,

I may not have been clear about this, but let me reiterate: The original sim failed at about 9,500K, but after applying the patch, and allowing for cosmic rays, it runs to completion (600 iterations) with the final temperature being the aforementioned.  (Actually, of the 378/600 iterations were at that temperature -- but that's a different issue.)

All in all, I think the patch is OK, but I'd like to talk about it before I commit it (to newdyna).  So, yes, we can talk tomorrow after the journal club.

Thanks,

Marios

Marios Chatzikos

unread,
Jan 7, 2021, 8:37:42 PM1/7/21
to cloud...@googlegroups.com

Hi Robin,

Thanks for providing some context to this.

The patch simply refactors the conditions for employing the time-dependent / advective terms with the level solvers, because they were not being used consistently throughout the code.  I refer specifically to the if statements in mole_solve.cpp did not check on the flag for equilibrium, which omitted that flag.

Apparently, with the current coding (sans the patch), the molecular network does not employ the conservation constraints in the matrix.  I think moving forward I should check how this affects things.  My feeling is that this part of the code has not been tested enough.

Thanks,

Marios

Marios Chatzikos

unread,
Jan 14, 2021, 5:20:54 PM1/14/21
to cloud...@googlegroups.com

Hello,

I have run the test suite with the patch, and got only two sims that botched:

auto/time_cool_cd_noaccu.out:  ChkMonitor botch>>Fe24                11.1710A   -29.1600 =   -29.1814  -0.051   0.050   intr    
slow/time_cool_cp.out:  ChkMonitor botch>>O  7                18.6270A    11.8113 =    11.8400   0.064   0.050   intr cumu
slow/time_cool_cp.out:  ChkMonitor botch>>He 2                1640.43A    11.8393 =    11.8700   0.068   0.050   intr cumu

The same botches occur even without the patch, and I've determined that they are due to changes in the time-stepping (I reverted to following changes in the H+ abundance, instead of using a fraction of the cooling time).  Note that the first botch checks the instantaneous line intensity at the last computed temperature, which changed due to the aforementioned revert -- note that the sim is there to exercise the 'set cumulative off' command, i.e., it does not exercise any part of the time-dependent algorithm.  The botches vanish when the previous time-stepping criterion (fraction of cooling time) is employed.

If you have any comments, please let me know.

Marios

Robin Williams

unread,
Jan 16, 2021, 11:29:12 AM1/16/21
to cloud...@googlegroups.com
Hi Marios --

OK, having now had time to look through the patch, it looks like it may actually be fixing an internal consistency in the code (i.e. genuinely a bug, rather the problem being triggered by gradual drift due to rounding error) -- so the conservation check was actually doing the job it was intended to do (i.e. isolating subtle bugs), albeit not quite where I expected.  Keeping policy together in doDynamicalIteration() does makes sense (DRY/SPoT) -- although the timing of when the variable "iteration" ticks upwards means that this could be more subtle than lexical consistency.  It may have been timely to rehearse the history & logic, nevertheless.

I do have a few minor points on the implementation:

1. I'm not sure about the name of the new routine: doNonEquilibriumSolve() might make more sense to someone not embedded in Cloudy's jargon (cf e.g. papers by Orly Gnat, or back as far as e.g. http://articles.adsabs.harvard.edu/pdf/1993A%26A...273..318S).

2. I'd suggest keeping the ipISO coding out of the logic check routine: this is only actually active in the one place, as it is implementing a logically-independent capability (the ability to run iso-sequences in equilibrium in an otherwise non-equilibrium calculation).

3. "iteration > dynamics.n_initial_relax+1": this may well be a fencepost error taken literally (i.e. whether the number of relaxation iterations is what you'd expect given the parameter setting).  Modifying this may require changing the default value of dynamics.n_initial_relax, but it may be worth exploring how many are actually required -- maybe this should actually be triggered by the optical depth convergence logic.

Best wishes
  Robin

Marios Chatzikos

unread,
Jan 16, 2021, 1:15:07 PM1/16/21
to cloud...@googlegroups.com

Hi Robin,

Thanks for the response.  Your points make sense.  I have incorporated them in the attached updated patch.

As you may notice, I've dropped "+1" in the check on the iteration number.  I'm running the test suite to confirm this has no side effects.

Marios

dyn.lgEq.2.patch

Marios Chatzikos

unread,
Jan 17, 2021, 2:41:17 PM1/17/21
to cloud...@googlegroups.com

Hi Robin,

The test suite passed clean (apart from the botches mentioned earlier, which are related to the time-stepping strategy).  I also checked the experimental sims, and got the same behavior as with the mainline.  To be clear: some of them failed, but the failures are not related to the patch.

The check against dynamics.n_initial_relax has a few variants across the code, e.g.,

atom_leveln.cpp:381:                    iteration > dynamics.n_initial_relax )
conv_init_solution.cpp:623:             if( iteration <= dynamics.n_initial_relax )
conv_itercheck.cpp:326:                 if( iteration <= dynamics.n_initial_relax+1 ||
dynamics.cpp:994:       if( iteration == dynamics.n_initial_relax+1)
dynamics.cpp:1054:              else if(iteration > dynamics.n_initial_relax+1 )

These may represent different use cases, and warrant a closer look.  However, this goes beyond the scope of the patch, and should be done separately.

Let me know if you have any more concerns.  If not, I'll commit the patch.

Thanks,

Marios

Marios Chatzikos

unread,
Jan 17, 2021, 3:07:06 PM1/17/21
to cloud...@googlegroups.com

Attached please find another patch, wherein the function has been turned into a method of the t_dynamics struct.  This is the version I'll commit, unless there are any concerns.

Marios
dyn.lgEq.3.patch

Gary J. Ferland

unread,
Jan 17, 2021, 3:08:05 PM1/17/21
to cloud...@googlegroups.com
thanks for all this - the patch can be applied to master?

Robin Williams

unread,
Jan 17, 2021, 3:09:55 PM1/17/21
to cloud...@googlegroups.com
Seems fine.  Yes, looking at the intent of all the variant of the dynamics.n_initial_relax test seems worth doing while you're thinking about it, but outside scope for this particular change.

  Robin


Marios Chatzikos

unread,
Jan 17, 2021, 3:09:57 PM1/17/21
to cloud...@googlegroups.com

I suppose so, but I'd apply it to newdyna instead.  There are a few other things I'm looking at.  After these are addressed we can talk about merging newdyna back onto master.

Gary J. Ferland

unread,
Jan 17, 2021, 3:21:49 PM1/17/21
to cloud...@googlegroups.com
I would like to make the strongest possible case for keeping development branches short-lived and patches as small as possible.

it is likely that one person can fully understand one patch.  Cloudy is a complex code due to the large number of interactions caused by the nested equations that describe the physical system.  The physical problem we are solving poses a large number of coupled processes and one seemingly benign change can affect seemingly unrelated part sof the code.  We have to be able to focus on the change and think about this.  The human mind, at least mine, cannot fully comprehend a large number of changes across multiple files that occurred over months or years.  I think we can understand a focused patch like this.  

The recommended best practice is, I believe, to keep branches away from the mainline for as short a time as possible and focus the changes to localized regions of the code.  I want to argue that we should do this going forward.

If it makes no difference, could we please post this patch to the master and leave newdyna for another day?  newdyna has been living away from the mainline for a pretty long time.  

thanks,
Gary

Marios Chatzikos

unread,
Jan 17, 2021, 3:51:21 PM1/17/21
to cloud...@googlegroups.com

Hi Gary,

First of all, newdyna is not a long-lived branch: it was created on July 27, 2020, as per the commit message at the time.  It did not have that many commits before we moved off to git.  Here's the full svn history:

$ svn log --stop-on-copy
------------------------------------------------------------------------
r14320 | marios | 2020-10-13 10:08:56 -0400 (Tue, 13 Oct 2020) | 2 lines

Merged from mainline r14170 through 14319.

------------------------------------------------------------------------
r14170 | marios | 2020-07-27 14:23:04 -0400 (Mon, 27 Jul 2020) | 20 lines

Added command 'save dt'.

source/save.h
  - Defined new parameters to control command output.

source/init_defaults_preparse.cpp
  - Initialized some of the new parameters.

source/parse_save.cpp
  - Added parsing of command 'save dt'.

source/dynamics.cpp
  - Defined function that saves the contents of the file.

tsuite/auto/time_cool_cd.in
  - Exercised new command.

docs/latex/hazy1/conout.tex
  - Documented new command.

------------------------------------------------------------------------
r14169 | marios | 2020-07-27 12:13:09 -0400 (Mon, 27 Jul 2020) | 4 lines

source/dynamics.cpp
  - Restored timestep_next() to original coding involving only changes to the
    H density.  Permits cooling below 6,500 K.

------------------------------------------------------------------------
r14168 | marios | 2020-07-27 10:45:42 -0400 (Mon, 27 Jul 2020) | 1 line

New branch
------------------------------------------------------------------------

And here is the git history:

$ git log
commit d02a0c85ffae9075a6596d9a3e787d143df62e02
Author: Marios Chatzikos <mchat...@gmail.com>
Date:   Mon Dec 7 17:47:55 2020 -0500

    Add columns for advective terms to save cool/heat
   
    The output of the save cooling/heating commands treats advective heating
    and cooling terms ambiguously.  It excludes them from the total when
    reporting fractional contributions, but reports them as contributing
    agents.  As a result, advective terms with fractions >>100% are commonly
    encountered in cooling simulations.
   
    Add columns to report these contributions, but exclude them from the
    list of cooling/heating agents.
   
    https://groups.google.com/g/cloudy-dev/c/69NLm1Ru_cs

commit 4441ba548530007b0ec9f986d42cab2b6c33b8ab
Author: Marios Chatzikos <mchat...@gmail.com>
Date:   Mon Dec 7 16:33:57 2020 -0500

    Prevent FPEs in cooling sims with external radiation fields
   
    Including external radiation fields (e.g., the CMB) in cooling sims
    may lead to FPEs.  The cumulative radiation field intensity involves
    multiplication by a factor that may be near the maximum value a float
    can hold.  This is not a problem with diffuse radiation.  However, when
    the CMB is included, the product exceeds that limit and an FPE occurs.
   
    The adopted bugfix hides in a new class the details of how the
    integrated flux is accumulated, while preserving how the flux vectors
    are accessed.  The latter is mainly useful for the instantaneous flux.
   
    As an added feature, the time-weighted mean flux is also computed.
   
    Reported-by: M. Chatzikos
    https://groups.google.com/g/cloudy-dev/c/GSWoDN_hxqc

commit 8027c7db9ad4e076231113675894c70b1918dcf7
Author: Milby, Jonathan S <j...@uky.edu>
Date:   Wed Dec 2 12:40:18 2020 -0500

    Initial commit for newdyna

All in all 4 commits.  The branch has not diverged that much from the mainline.

Second, long-lived branches are not advised, but they are not anathema either.  The ProGit book discusses both types of branches, with the short-lived variant being ideal for bug fixes, etc, and long-lived ones for more fundamental changes.  It is the responsibility of the developer to resolve any conflicts that will come up during the reintegration to the mainline.  But, yes it is true: the longer the divergence from the mainline, the more the conflicts that will have to be resolved.

Other than that I am in agreement that commits should be focused on one particular issue.  It sounds like you would like the refactoring be done in a separate commit?

I will commit to newdyna, as I intended.  As I said earlier, there are a few other issues I'm looking at.  I will post about them on this forum.  Once they're resolved, I will generate a merge request.  Hopefully in a couple of weeks time?

Thanks,

Marios


PS: On a related topic, it seems that trac.nublado.org now points to gitlab.  I thought we'd agreed to keep the old infrastructure available online for reference.  What has changed?

Gary J. Ferland

unread,
Jan 17, 2021, 4:14:31 PM1/17/21
to cloud...@googlegroups.com
I believe that most project management advice is to keep changes small and deal with a problem only one time when everyone is thinking about it.  The one resource we do not have enough of is knowledgeable human effort.

Considering a set of changes carefully here, then placing them off in a branch to age like a fine wine, means that knowledgeable humans have to reconsider the changes again at a later date, in this case, six or more months later.  The reintegration soaks up more human effort in addition to the original commit.  I don't see the advantage to this aging process over doing a short-lived branch, fully consider and discuss it over a few days or weeks, test it fully, then reintegrate and delete the branch.   This is the recommended style in the books I have read and it makes sense to me.  

We are in a period of transition from our previous style of working, to a new style of working and committing.  There is a middle ground between the horrendous direct development on the trunk which so wounded it in 2019, and long-lived branches.  Please think about it.  newdyna is fine for now but let's reintegrate as soon as you feel comfortable.

regarding trac - I see what you mean and will ask Jon to fix it.  I had asked him to redirect www.nublado.org and nublado.org to the GitLab wiki, and he did.  It looks like too much was redirected.  I too was looking at the svn record but have the svn repo set up on an ubuntu laptop here in my basement.  I didn't check nublado but do see the problem.   I was not able to set up trac on the ubuntu 20 machine since it has python 3.xx and trac requires 2.

thanks,
Gary


Marios Chatzikos

unread,
Jan 18, 2021, 6:49:09 PM1/18/21
to cloud...@googlegroups.com

Hi All,

This is to let you know that after further discussing with Gary, I decided it's best to commit now instead of postponing for two weeks or so.  There were already a few distinct changesets on the branch, and adding more to them did not seem it'd be rewarding.  The branch is now merged, and master will be exercised on tonight's tests.  newdyna has been deleted.

Marios


PS: The merge commit message includes a synopsis (in bullet points) of the commits that had occurred on the branch -- essentially the titles of each commit.  I'm not sure how typical this is, but I thought I'd give more context than a plain title might provide.

PPS: Branches ported from SVN cannot be merged with Gitlab -- I think b/c the latter requires that the branches were spawned from master, while these were not.  Neither can git-rebase be used on the command-line.  The only way forward for these legacy branches is to use git-merge from the command line.  FYI.

Reply all
Reply to author
Forward
0 new messages