CMB in cooling simulations

55 views
Skip to first unread message

Marios Chatzikos

unread,
Oct 27, 2020, 3:44:24 PM10/27/20
to cloud...@googlegroups.com
Hello,

I want to report an interesting bug I just came across.

This applies to simulation tsuite/auto/time_cool_cd.in.  Adding the CMB
to the simulation leads to the following abort:

 DISASTER - A floating point exception occurred. Bailing out...

PRNG seed: 0x4fa3f91a4d6801dc

Backtrace follows:

[0x00000000004a2400] t_cpu_i::signal_handler(int, siginfo_t*, void*) --
source/cpu.cpp:543 (discriminator 1)
[0x00007f4ab5797400] ?? -- ??:0
[0x000000000052db45] IterEnd() -- source/iter_startend.cpp:1140
(discriminator 2)
[0x000000000048dd05] cloudy() -- source/cloudy.cpp:262
[0x000000000045a07a] cdDrive() -- source/cddrive.cpp:153
[0x000000000044bff5] cdMain(int, char const**) -- source/maincl.cpp:510
[0x000000000043e01a] main -- source/maincl.cpp:123
[0x00007f4ab5783555] __libc_start_main -- ??:0
[0x000000000044b38f] ?? -- ??:?

The relevant code is at source/iter_startend.cpp:1140:

1136                 // save cumulative continua
1137                 for( long n=0; n<rfield.nflux; ++n)
1138                 {
1139
1140                         rfield.flux[1][n] += (realnum)
rfield.flux[0][n]*cumulative_factor;
1141                         rfield.ConEmitReflec[1][n] += (realnum)
rfield.ConEmitReflec[0][n]*cumulative_factor;
1142                         rfield.ConEmitOut[1][n] += (realnum)
rfield.ConEmitOut[0][n]*cumulative_factor;
1143                         rfield.ConRefIncid[1][n] += (realnum)
rfield.ConRefIncid[0][n]*cumulative_factor;
1144                         rfield.flux_total_incident[1][n] +=
(realnum) rfield.flux_total_incident[0][n]*cumulative_factor;
1145                         rfield.reflin[1][n] += (realnum)
rfield.reflin[0][n]*cumulative_factor;
1146                         rfield.outlin[1][n] += (realnum)
rfield.outlin[0][n]*cumulative_factor;
1147                 }

This occurs because the cumulative_factor is close to the limit of what
a 32-bit float can hold, 1.783991e+37, and the intensity of the CMB in
the first bin is 5.309262e+02, which pushes it above that limit.

The obvious fix is to upgrade rfield.flux to double precision, but that
may come with a computational cost.

Let me know what you think.

Marios

Robin Williams

unread,
Oct 27, 2020, 6:44:39 PM10/27/20
to cloud...@googlegroups.com
I guess the other option is to scale everything down in some manner.

Knuth gives a numerically-stable way of calculating the mean (and variance) in a single pass using a recurrence
  M_k = M_{k-1} + ( x_k - M_{k-1} ) / k
  S_k = S_{k-1} + ( x_k - M_{k-1} ) * ( x_k - M_k )
starting with M_1 = x_1, S_1 = 0 (and ignoring S if you only need the mean).

  Robin


--
--
http://groups.google.com/group/cloudy-dev
---
You received this message because you are subscribed to the Google Groups "cloudy-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloudy-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloudy-dev/707b0948-22b7-b893-c97b-c1c5bc6941ed%40gmail.com.

Marios Chatzikos

unread,
Oct 27, 2020, 6:54:22 PM10/27/20
to cloud...@googlegroups.com

Hi Robin,

Wouldn't it simpler to store 'cumulative_factor' instead?  That would make having the main flux array be an array of two arrays redundant -- we might as well be computing the cumulative flux etc on the fly.

Marios

Marios Chatzikos

unread,
Oct 27, 2020, 7:00:00 PM10/27/20
to cloud...@googlegroups.com

I take that back.  That would require saving the radiation flux for each iteration separately.

-m

Gary J. Ferland

unread,
Oct 27, 2020, 7:02:51 PM10/27/20
to cloud...@googlegroups.com
I would go with the simplest possible solution and just promote the float to a double. Then check for any casts to float where it is created.   It increases the memory footprint but these arrays are not large.  The arith will be faster but these evaluations should be insignificant



--
Gary J. Ferland
Physics, Univ of Kentucky
Lexington KY 40506 USA
Tel: 859 257-8795
https://pa.as.uky.edu/users/gary

Peter van Hoof

unread,
Oct 27, 2020, 10:06:20 PM10/27/20
to cloud...@googlegroups.com
On 28/10/2020 00.02, Gary J. Ferland wrote:
> I would go with the simplest possible solution and just promote the float to a
> double. Then check for any casts to float where it is created.   It increases
> the memory footprint but these arrays are not large.  The arith will be faster
> but these evaluations should be insignificant

I am not sure that is true. This involves all flux arrays, which are heavily
used in integrations over the radiation field. Making them double may pessimize
certain vectorization optimizations. This could have a real impact on speed and
is worth a closer look, especially for sims with grains.

> On Tue, Oct 27, 2020 at 7:00 PM Marios Chatzikos <mchat...@gmail.com
> <mailto:mchat...@gmail.com>> wrote:
>
> I take that back.  That would require saving the radiation flux for each
> iteration separately.
>
> -m
>
> On 10/27/20 18:54, Marios Chatzikos wrote:
>>
>> Hi Robin,
>>
>> Wouldn't it simpler to store 'cumulative_factor' instead?  That would make
>> having the main flux array be an array of two arrays redundant -- we might
>> as well be computing the cumulative flux etc on the fly.
>>
>> Marios
>>
>> On 10/27/20 18:44, 'Robin Williams' via cloudy-dev wrote:
>>> I guess the other option is to scale everything down in some manner.
>>>
>>> Knuth gives a numerically-stable way of calculating the mean (and
>>> variance) in a single pass using a recurrence
>>> M_k = M_{k-1} + ( x_k - M_{k-1} ) / k
>>> S_k = S_{k-1} + ( x_k - M_{k-1} ) * ( x_k - M_k )
>>> starting with M_1 = x_1, S_1 = 0 (and ignoring S if you only need the mean).
>>>
>>> Robin
>>>
>>>
>>> On Tue, 27 Oct 2020 at 19:44, Marios Chatzikos <mchat...@gmail.com
>>> <mailto:mchat...@gmail.com>> wrote:
>>>
>>> Hello,
>>>
>>> I want to report an interesting bug I just came across.
>>>
>>> This applies to simulation tsuite/auto/time_cool_cd.in
>>> <http://time_cool_cd.in>. Adding the CMB
>>> <mailto:cloudy-dev%2Bunsu...@googlegroups.com>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/cloudy-dev/707b0948-22b7-b893-c97b-c1c5bc6941ed%40gmail.com.
>>>
>>> --
>>> --
>>> http://groups.google.com/group/cloudy-dev
>>> ---
>>> You received this message because you are subscribed to the Google Groups
>>> "cloudy-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to cloudy-dev+...@googlegroups.com
>>> <mailto:cloudy-dev+...@googlegroups.com>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/cloudy-dev/CAJBtPxcEfkXAYwaHUFB_HLarmh6GApnv-u34eeCA0YXTURLftA%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/cloudy-dev/CAJBtPxcEfkXAYwaHUFB_HLarmh6GApnv-u34eeCA0YXTURLftA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
> --
> --
> http://groups.google.com/group/cloudy-dev
> ---
> You received this message because you are subscribed to the Google Groups
> "cloudy-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cloudy-dev+...@googlegroups.com
> <mailto:cloudy-dev+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cloudy-dev/91adbe10-3c81-73a0-82ab-58b051dadcb3%40gmail.com
> <https://groups.google.com/d/msgid/cloudy-dev/91adbe10-3c81-73a0-82ab-58b051dadcb3%40gmail.com?utm_medium=email&utm_source=footer>.
>
>
>
> --
> Gary J. Ferland
> Physics, Univ of Kentucky
> Lexington KY 40506 USA
> Tel: 859 257-8795
> https://pa.as.uky.edu/users/gary
>
> --
> --
> http://groups.google.com/group/cloudy-dev
> ---
> You received this message because you are subscribed to the Google Groups
> "cloudy-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudy-dev+...@googlegroups.com
> <mailto:cloudy-dev+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cloudy-dev/CAOxTjFR2X-UaTvFREJ8d_URUFYM4Rif2_4BOHMJPmTaGYWD_gg%40mail.gmail.com
> <https://groups.google.com/d/msgid/cloudy-dev/CAOxTjFR2X-UaTvFREJ8d_URUFYM4Rif2_4BOHMJPmTaGYWD_gg%40mail.gmail.com?utm_medium=email&utm_source=footer>.


--
Peter van Hoof
Royal Observatory of Belgium
Ringlaan 3
1180 Brussel
Belgium
http://homepage.oma.be/pvh

Gary J. Ferland

unread,
Oct 27, 2020, 10:30:36 PM10/27/20
to cloud...@googlegroups.com
good points - how about doing the test?

To unsubscribe from this group and stop receiving emails from it, send an email to cloudy-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloudy-dev/34997061-7209-9e2c-08cb-cde49764e9c3%40nublado.org.

Gary J. Ferland

unread,
Oct 27, 2020, 11:22:40 PM10/27/20
to cloud...@googlegroups.com
by "doing the test" I meant to change it to a double and run the test suite.  Do the timings change?

Marios Chatzikos

unread,
Oct 27, 2020, 11:47:53 PM10/27/20
to cloud...@googlegroups.com

Hi,

Attached please find the patch that Peter described.  It promotes to double all rfield arrays involved in the code excerpt I showed in the first e-mail, and accounts for casts, and arithmetic with floats.

The test suite passes clean.  Plots of relative and absolute differences in execution time for all sims against the pristine head of the trunk are attached.  Differences appear to have a slight bend toward longer execution times with the patch, but there a lot of sims that actually ran faster.  Grains simulations do not seem to suffer systematically from these changes.  Review of the attached data file may convince you of that.

Please let me know if the patch looks OK to commit.

Thanks,

Marios

rfield-double.patch
timings_float_vs_double_frac.pdf
timings_float_vs_double.pdf
comp-float-double.txt

Robin Williams

unread,
Oct 28, 2020, 4:13:22 AM10/28/20
to cloud...@googlegroups.com
Hi Marios --

The timing results are strange.  Some parts of the code which stream through the flux arrays always used to be significant pace-setters, and those will pretty much inevitably be slowed down by a factor 2 by having to stream through twice as much memory.

The possibilities which jump to mind are: a) that there's cancellation error somewhere which means improving the accuracy is allowing to converge faster; b) enough other stuff is double that reducing the number of type conversions now helps; c) that there's noise in the timings on the system you ran this on at a similar level to the performance differentials.  Peter had to do a lot of tweaking to get the test suite timings on cloud9 as good as they are now -- I guess if you commit this change, these statistics will build up in due course.  One possibility might be to profile one or two of the cases which get a performance boost and see where the difference actually sits.

Best wishes
  Robin

Gary J. Ferland

unread,
Oct 28, 2020, 7:49:22 AM10/28/20
to cloud...@googlegroups.com
I looked over the patch and it seems fine.  There seemed to be a lot of casts that went away and only a few were added, so the net coding is simpler.  

I did a passive approach and just looked at the changes.  I didn't do a search for missing casts.  

thanks,
Gary

Peter van Hoof

unread,
Oct 28, 2020, 10:20:43 AM10/28/20
to cloud...@googlegroups.com
On 28/10/2020 04.22, Gary J. Ferland wrote:
> by "doing the test" I meant to change it to a double and run the test suite.  Do
> the timings change?

The test suite only tells you so much as it largely avoids grains or applies
speed optimizations to the grains. Making the frequency grid finer made the
grains slower, but we missed that in the test suite for the same reasons.

You should also remove short-running sims as they are not good speed benchmarks.

> On Tue, Oct 27, 2020 at 10:30 PM Gary J. Ferland <ga...@g.uky.edu
> <mailto:ga...@g.uky.edu>> wrote:
>
> good points - how about doing the test?
>
> On Tue, Oct 27, 2020 at 10:06 PM Peter van Hoof <p...@nublado.org
> <mailto:p...@nublado.org>> wrote:
>
> On 28/10/2020 00.02, Gary J. Ferland wrote:
> > I would go with the simplest possible solution and just promote the
> float to a
> > double. Then check for any casts to float where it is created.   It
> increases
> > the memory footprint but these arrays are not large.  The arith will
> be faster
> > but these evaluations should be insignificant
>
> I am not sure that is true. This involves all flux arrays, which are
> heavily
> used in integrations over the radiation field. Making them double may
> pessimize
> certain vectorization optimizations. This could have a real impact on
> speed and
> is worth a closer look, especially for sims with grains.
>
> > On Tue, Oct 27, 2020 at 7:00 PM Marios Chatzikos
> <mchat...@gmail.com <mailto:mchat...@gmail.com>
> >>>         <mailto:cloudy-dev%2Bunsu...@googlegroups.com
> <mailto:cloudy-dev%252Buns...@googlegroups.com>>.
> >>>         To view this discussion on the web visit
> >>>
> https://groups.google.com/d/msgid/cloudy-dev/707b0948-22b7-b893-c97b-c1c5bc6941ed%40gmail.com.
> >>>
> >>>     --
> >>>     --
> >>> http://groups.google.com/group/cloudy-dev
> >>>     ---
> >>>     You received this message because you are subscribed to the
> Google Groups
> >>>     "cloudy-dev" group.
> >>>     To unsubscribe from this group and stop receiving emails from
> it, send an
> >>>     email to cloudy-dev+...@googlegroups.com
> <mailto:cloudy-dev%2Bunsu...@googlegroups.com>
> >>>     <mailto:cloudy-dev+...@googlegroups.com
> <mailto:cloudy-dev%2Bunsu...@googlegroups.com>>.
> >>>     To view this discussion on the web visit
> >>>
> https://groups.google.com/d/msgid/cloudy-dev/CAJBtPxcEfkXAYwaHUFB_HLarmh6GApnv-u34eeCA0YXTURLftA%40mail.gmail.com
> >>>
>  <https://groups.google.com/d/msgid/cloudy-dev/CAJBtPxcEfkXAYwaHUFB_HLarmh6GApnv-u34eeCA0YXTURLftA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> >
> >     --
> >     --
> > http://groups.google.com/group/cloudy-dev
> >     ---
> >     You received this message because you are subscribed to the
> Google Groups
> >     "cloudy-dev" group.
> >     To unsubscribe from this group and stop receiving emails from it,
> send an
> >     email to cloudy-dev+...@googlegroups.com
> <mailto:cloudy-dev%2Bunsu...@googlegroups.com>
> >     <mailto:cloudy-dev+...@googlegroups.com
> <mailto:cloudy-dev%2Bunsu...@googlegroups.com>>.
> >     To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/cloudy-dev/91adbe10-3c81-73a0-82ab-58b051dadcb3%40gmail.com
> >
>  <https://groups.google.com/d/msgid/cloudy-dev/91adbe10-3c81-73a0-82ab-58b051dadcb3%40gmail.com?utm_medium=email&utm_source=footer>.
> >
> >
> >
> > --
> > Gary J. Ferland
> > Physics, Univ of Kentucky
> > Lexington KY 40506 USA
> > Tel: 859 257-8795
> > https://pa.as.uky.edu/users/gary
> >
> > --
> > --
> > http://groups.google.com/group/cloudy-dev
> > ---
> > You received this message because you are subscribed to the Google
> Groups
> > "cloudy-dev" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send an email
> > to cloudy-dev+...@googlegroups.com
> <mailto:cloudy-dev%2Bunsu...@googlegroups.com>
> > <mailto:cloudy-dev+...@googlegroups.com
> <mailto:cloudy-dev%2Bunsu...@googlegroups.com>>.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/cloudy-dev/CAOxTjFR2X-UaTvFREJ8d_URUFYM4Rif2_4BOHMJPmTaGYWD_gg%40mail.gmail.com
>
> >
> <https://groups.google.com/d/msgid/cloudy-dev/CAOxTjFR2X-UaTvFREJ8d_URUFYM4Rif2_4BOHMJPmTaGYWD_gg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
>
> --
> Peter van Hoof
> Royal Observatory of Belgium
> Ringlaan 3
> 1180 Brussel
> Belgium
> http://homepage.oma.be/pvh
>
> --
> --
> http://groups.google.com/group/cloudy-dev
> ---
> You received this message because you are subscribed to the Google
> Groups "cloudy-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to cloudy-dev+...@googlegroups.com
> <mailto:cloudy-dev%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cloudy-dev/34997061-7209-9e2c-08cb-cde49764e9c3%40nublado.org.
>
>
>
> --
> Gary J. Ferland
> Physics, Univ of Kentucky
> Lexington KY 40506 USA
> Tel: 859 257-8795
> https://pa.as.uky.edu/users/gary
>
>
>
> --
> Gary J. Ferland
> Physics, Univ of Kentucky
> Lexington KY 40506 USA
> Tel: 859 257-8795
> https://pa.as.uky.edu/users/gary
>
> --
> --
> http://groups.google.com/group/cloudy-dev
> ---
> You received this message because you are subscribed to the Google Groups
> "cloudy-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudy-dev+...@googlegroups.com
> <mailto:cloudy-dev+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cloudy-dev/CAOxTjFRT4Y1o51A%3D6CxtRMQ9FXH%3Dyw6LoeccgZP9nHEcmBB5UQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/cloudy-dev/CAOxTjFRT4Y1o51A%3D6CxtRMQ9FXH%3Dyw6LoeccgZP9nHEcmBB5UQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Marios Chatzikos

unread,
Oct 28, 2020, 10:41:28 AM10/28/20
to cloud...@googlegroups.com
Hi Peter,

If you have a grains sim that you think might be able to test this
aspect of the code, could you please share it?

Thanks,

Marios

Marios Chatzikos

unread,
Oct 28, 2020, 11:31:21 PM10/28/20
to cloud...@googlegroups.com

Hi Robin,

The shorter execution times surprised me as well.

I'm not sure what the best way to test your first two conjectures is, but here are a few observations from the two runs.  I've looked only at sims that had differences greater than 10 sec for these.

Of the sims that ran longer with the patch, none took more iterations to converge, and only one or two required one additional zone (out of 300 or so).  These appear to be consistent with needing longer time to run because they involve twice as much data, but not by much: after all, only a few vectors were modified.

Of those that ran faster, the outlier is auto/blr_n13_p18_Z20.in by a speed-up of 30 sec (out of ~960).  Both sims took the same number of iterations (6), but the patched code required fewer zones to converge: 725 vs 739.  This seems consistent with your suggestion for improved accuracy leading to faster convergence.

I will profile both examples tomorrow, and I'll let you know if I find anything interesting.

In terms of your third point, I ran the tests locally on nephos -- I think they're secure.  I could rerun them on cloud9, as well, if you think that's advisable.  Or I could consider the median of three runs for each of them for the comparison.  But I think the profiling might be more instructive.

Thanks for the thoughtful input,

Marios

Peter van Hoof

unread,
Oct 29, 2020, 6:34:46 AM10/29/20
to cloud...@googlegroups.com
Hi Marios,

BLR sims don't have grains so the integrals over the radiation field would be in
GammaK() if memory serves. They are vectorized by routines with names starting
with "reduce". There may be merit to Robin's remark (point b) that there is a
mix of realnum/double arrays as parameters to these routines and promoting the
flux arrays to double may make them pure double which would be more efficient.
Haven't checked that though.

Attached is a crude sim for NGC 7027 that goes into the molecular region. This
should be a reasonable test for the grain code. It takes around 13 min with
c17.02 on my machine (which was running other stuff as well).

It is important to run tests on an otherwise empty machine to get good timings.
Even minor activity can have a hugely disproportionate effect on the timings.
This is because timing stuff is not a priority for the kernel.


Cheers,

Peter.

On 29/10/2020 04.31, Marios Chatzikos wrote:
> Hi Robin,
>
>>> >>>     <mailto:mchat...@gmail.com
>>> >>>         <mailto:cloudy-dev%2Bunsu...@googlegroups.com
>>> <mailto:cloudy-dev%252Buns...@googlegroups.com>>.
>>> >>>         To view this discussion on the web visit
>>> >>>
>>> https://groups.google.com/d/msgid/cloudy-dev/707b0948-22b7-b893-c97b-c1c5bc6941ed%40gmail.com.
>>> >>>
>>> >>>     --
>>> >>>     --
>>> >>> http://groups.google.com/group/cloudy-dev
>>> >>>     ---
>>> >>>     You received this message because you are subscribed to
>>> the Google Groups
>>> >>>     "cloudy-dev" group.
>>> >>>     To unsubscribe from this group and stop receiving emails
>>> from it, send an
>>> >>>     email to cloudy-dev+...@googlegroups.com
>>> <mailto:cloudy-dev%2Bunsu...@googlegroups.com>
>>> >>>     <mailto:cloudy-dev+...@googlegroups.com
>>> <mailto:cloudy-dev%2Bunsu...@googlegroups.com>>.
>>> >>>     To view this discussion on the web visit
>>> >>>
>>> https://groups.google.com/d/msgid/cloudy-dev/CAJBtPxcEfkXAYwaHUFB_HLarmh6GApnv-u34eeCA0YXTURLftA%40mail.gmail.com
>>> >>>
>>>  <https://groups.google.com/d/msgid/cloudy-dev/CAJBtPxcEfkXAYwaHUFB_HLarmh6GApnv-u34eeCA0YXTURLftA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>> >
>>> >     --
>>> >     --
>>> > http://groups.google.com/group/cloudy-dev
>>> >     ---
>>> >     You received this message because you are subscribed to the
>>> Google Groups
>>> >     "cloudy-dev" group.
>>> >     To unsubscribe from this group and stop receiving emails
>>> from it, send an
>>> >     email to cloudy-dev+...@googlegroups.com
>>> <mailto:cloudy-dev%2Bunsu...@googlegroups.com>
>>> >     <mailto:cloudy-dev+...@googlegroups.com
>>> <mailto:cloudy-dev%2Bunsu...@googlegroups.com>>.
>>> >     To view this discussion on the web visit
>>> >
>>> https://groups.google.com/d/msgid/cloudy-dev/91adbe10-3c81-73a0-82ab-58b051dadcb3%40gmail.com
>>> >
>>>  <https://groups.google.com/d/msgid/cloudy-dev/91adbe10-3c81-73a0-82ab-58b051dadcb3%40gmail.com?utm_medium=email&utm_source=footer>.
>>> >
>>> >
>>> >
>>> > --
>>> > Gary J. Ferland
>>> > Physics, Univ of Kentucky
>>> > Lexington KY 40506 USA
>>> > Tel: 859 257-8795
>>> > https://pa.as.uky.edu/users/gary
>>> >
>>> > --
>>> > --
>>> > http://groups.google.com/group/cloudy-dev
>>> > ---
>>> > You received this message because you are subscribed to the
>>> Google Groups
>>> > "cloudy-dev" group.
>>> > To unsubscribe from this group and stop receiving emails from
>>> it, send an email
>>> > to cloudy-dev+...@googlegroups.com
>>> <mailto:cloudy-dev%2Bunsu...@googlegroups.com>
>>> > <mailto:cloudy-dev+...@googlegroups.com
>>> <mailto:cloudy-dev%2Bunsu...@googlegroups.com>>.
>>> > To view this discussion on the web visit
>>> >
>>> https://groups.google.com/d/msgid/cloudy-dev/CAOxTjFR2X-UaTvFREJ8d_URUFYM4Rif2_4BOHMJPmTaGYWD_gg%40mail.gmail.com
>>>
>>> >
>>> <https://groups.google.com/d/msgid/cloudy-dev/CAOxTjFR2X-UaTvFREJ8d_URUFYM4Rif2_4BOHMJPmTaGYWD_gg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>>
>>>
>>> --
>>> Peter van Hoof
>>> Royal Observatory of Belgium
>>> Ringlaan 3
>>> 1180 Brussel
>>> Belgium
>>> http://homepage.oma.be/pvh
>>>
>>> --
>>> --
>>> http://groups.google.com/group/cloudy-dev
>>> ---
>>> You received this message because you are subscribed to the
>>> Google Groups "cloudy-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it,
>>> send an email to cloudy-dev+...@googlegroups.com
>>> <mailto:cloudy-dev%2Bunsu...@googlegroups.com>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/cloudy-dev/34997061-7209-9e2c-08cb-cde49764e9c3%40nublado.org.
>>>
>>>
>>>
>>> --
>>> Gary J. Ferland
>>> Physics, Univ of Kentucky
>>> Lexington KY 40506 USA
>>> Tel: 859 257-8795
>>> https://pa.as.uky.edu/users/gary
>>>
>>>
>>>
>>> --
>>> Gary J. Ferland
>>> Physics, Univ of Kentucky
>>> Lexington KY 40506 USA
>>> Tel: 859 257-8795
>>> https://pa.as.uky.edu/users/gary
>>> --
>>> --
>>> http://groups.google.com/group/cloudy-dev
>>> ---
>>> You received this message because you are subscribed to the Google Groups
>>> "cloudy-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to cloudy-dev+...@googlegroups.com
>>> <mailto:cloudy-dev+...@googlegroups.com>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/cloudy-dev/CAOxTjFRT4Y1o51A%3D6CxtRMQ9FXH%3Dyw6LoeccgZP9nHEcmBB5UQ%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/cloudy-dev/CAOxTjFRT4Y1o51A%3D6CxtRMQ9FXH%3Dyw6LoeccgZP9nHEcmBB5UQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>> --
>> --
>> http://groups.google.com/group/cloudy-dev
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "cloudy-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to cloudy-dev+...@googlegroups.com
>> <mailto:cloudy-dev+...@googlegroups.com>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/cloudy-dev/868298aa-dee9-bf0a-43e8-d6c75e25fdc5%40gmail.com
>> <https://groups.google.com/d/msgid/cloudy-dev/868298aa-dee9-bf0a-43e8-d6c75e25fdc5%40gmail.com?utm_medium=email&utm_source=footer>.
>>
>> --
>> --
>> http://groups.google.com/group/cloudy-dev
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "cloudy-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to cloudy-dev+...@googlegroups.com
>> <mailto:cloudy-dev+...@googlegroups.com>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/cloudy-dev/CAJBtPxd3VrPxuGev8pL%2BSUyBe8s8V50h-xYKyrr%2BDLU%3Dn%3Dk%3Dmg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/cloudy-dev/CAJBtPxd3VrPxuGev8pL%2BSUyBe8s8V50h-xYKyrr%2BDLU%3Dn%3Dk%3Dmg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
> --
> --
> http://groups.google.com/group/cloudy-dev
> ---
> You received this message because you are subscribed to the Google Groups
> "cloudy-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudy-dev+...@googlegroups.com
> <mailto:cloudy-dev+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cloudy-dev/e6433314-40ad-1ebf-c72f-70005dd3d162%40gmail.com
> <https://groups.google.com/d/msgid/cloudy-dev/e6433314-40ad-1ebf-c72f-70005dd3d162%40gmail.com?utm_medium=email&utm_source=footer>.
n7027.in

Marios Chatzikos

unread,
Oct 30, 2020, 11:44:33 AM10/30/20
to cloud...@googlegroups.com
Hello,

I profiled a few sims to see how the timings break down.  These were chosen
among those that had differences more than 10 sec in the original
comparison.
I reran them in pairs of trunk/patch so that they were the only thing I was
actively running on nephos at the time -- if system processes were
taking place,
they would likely affect them similarly.  I also ran the grains-oriented
sim that
Peter sent yesterday.  In any case, here they are their (updated) run
times, and
numbers of zones:

sim                     trunk           patch
--------------------    -----           -----
slow/h2_orion_bar.in    1274.86, 800    1314.36, 801
slow/h2_th85orion.in    689.96, 872     689.12, 872
auto/blr_n13_p18_Z20.in 793.48, 739     774.52, 725
n7027.out               641.99, 813     647.43, 812

My overall impression is that the patch sometimes improves the running
time of
the sim, others it impairs it , while still others it leaves it unaffected.
Deviations are at the 3% level.

In the Orion bar sim, the bulk of execution time is spent in linint(),
invoked
for Stout, Chianti and H2 cross sections, iso_level(), H2_Cooling(), and
some
LAPACK functions.  About 3% more calls were required with the patch,
accounting
for the difference in execution time.

Calls to iso_level() dominate the execution time in the BLR sim. The patched
code requires 3% fewer zones, and fewer calls, hence the execution
speed-up.

In Peter's sim, almost half the time is taken up by TryDoubleStep(),
with the
patched version taking an extra two seconds.  The remainder of the
difference
comes from a large number of sub-second offsets between the runs, including
functions which do not involve any of the vectors I modified. About the same
number of calls were made to these functions in both cases. Calculations of
grain continuum emission use vectors private to the grains code, so the only
interaction with the changes I've made would be through rfield.SummedCon,
already a double, which sums over rfield.flux[0].  That vector is also
explicitly used with the calculation of the grain drift velocity. Other than
that, I see no other obvious connection, so I don't quite understand how the
patch could affect the execution time for grains.  Peter, I'd welcome
any comments.

The TH85 sim runs in about the same time, and profiling shows that the
execution
time of specific functions is comparable between the two runs. Roughly
the same
number of calls were made with and without the patch.

Let me know if you have any further concerns.  If not, I would like to
commit
the patch to the head of the trunk.

Thanks,
Marios

PS: For details, see: http://www.pa.uky.edu/~marios/cloudy/prof.tar.xz

Robin Williams

unread,
Oct 30, 2020, 11:51:54 AM10/30/20
to cloud...@googlegroups.com
Hi Marios --

I forgot to send the below yesterday evening -- there's a talk in a few minutes, I'll try to look at your message properly later.

> It is important to run tests on an otherwise empty machine to get good timings.
> Even minor activity can have a hugely disproportionate effect on the timings.
> This is because timing stuff is not a priority for the kernel.

Also other load on the CPU can lead to throttling or contention for resources such as memory bandwidth.
> I'm not sure what the best way to test your first two conjectures is...

a) reduced cancellation/rounding error -- fewer iterations is certainly evidence for this.  The comparison with the results of the FLT_IS_DBL test suite runs may be informative.  Obviously there's always the question of whether the convergence bounds are over-stringent, or the cancellation error can be reduced in some other manner. b) conversion cost -- you'd probably need to look at this with instruction-level profiling (e.g. perf-tools or OProfile).  Even then the attribution of execution time to instructions is "in the general area" rather than exact, so it might be easier to try to understand the likely impact using standalone code snippets rather than a full-up run of Cloudy.  Using godbolt.org to find the relevant instruction names:  https://www.felixcloutier.com/x86/cvtss2sd  https://www.felixcloutier.com/x86/cvtps2pdand checking https://www.agner.org/optimize/instruction_tables.pdf, this is pretty cheap on modern CPUs: comparable to an addition -- but that's not zero.
  Robin

To unsubscribe from this group and stop receiving emails from it, send an email to cloudy-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloudy-dev/75d09fe6-0bf4-e6fd-648c-b3dca9f11842%40gmail.com.

Robin Williams

unread,
Nov 1, 2020, 3:36:03 PM11/1/20
to cloud...@googlegroups.com
P.S. it may be YAGNI, but calculations with double precision fluxes may end up as a bottleneck if there's ever a move to more exact radiative transfer, and effectively rules out accelerating this aspect of the code with most commodity GPUs.  But I guess the changes are small enough that they could be wound back and the underlying problem dealt with in another way should this ever transpire.

Marios Chatzikos

unread,
Nov 2, 2020, 11:27:38 AM11/2/20
to cloud...@googlegroups.com

Hi Robin,

As I mentioned, there is a sim that appears to exhibit the behavior you described in your first point.

Concerning the second point, I don't think it would be necessary to test things at that level of detail.  Although I did not mention it in the previous e-mail, I did check how much time was spent on each of the functions the patch touches on, and for all sims, with and without the patch, those numbers were a very small fraction of the total and/or the patch's impact was minimal.  This indicates that the conversion cost should be minimal.

For the record, the compute time (not wallclock time) of the test suite increased by 144 seconds with the patch.

Thanks,

Marios

Peter van Hoof

unread,
Nov 2, 2020, 12:22:49 PM11/2/20
to cloud...@googlegroups.com
Hi Marios,

It indeed looks like the impact is quite small, so go ahead and make the change.


Cheers,

Peter.

Marios Chatzikos

unread,
Nov 2, 2020, 12:32:52 PM11/2/20
to cloud...@googlegroups.com

Hi Robin,

This is not YAGNI, actually -- I should start working on RT soon.

I've looked into double-precision GPUs -- they exist but they are slower relative to 32-bit processing by about a dex.

https://www.gamersnexus.net/hwreviews/2518-nvidia-gtx-1060-review-and-benchmark-vs-rx-480?showall=1

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions

Even in 64-bit, that'd give us about ~200 GFLOPs, comparable to a 6-core i7.  The flux array operations need not be carried out on the GPU, though.

Marios

Robin Williams

unread,
Nov 2, 2020, 1:20:21 PM11/2/20
to cloud...@googlegroups.com
> I've looked into double-precision GPUs -- they exist but they are slower relative to 32-bit processing by about a dex.

Unless you have $200k you'd like to swap for an A100 ;-).  Oddly, this is one area where the integrated graphics on commodity Intel CPUs does well -- mostly they only halve the rate for double precision.

Thinking about this more, IIRC the real pace-setter in this area is this loop in rt_tau_inc.cpp, which is probably still realnum:

  for( long i=0; i<rfield.nfine; ++i )
  {
    realnum tauzone = rfield.fine_opac_zone[i]*(realnum)radius.drad_x_fillfac;
    rfield.fine_opt_depth[i] += tauzone;
  }

I put a simple shallow-water flow solver into OpenCL as an exploration.  I'm sure I haven't have figured out all the angles, but I get ~ 10x speed up going from 1 core of an i7 3770 -> GT 1030, where the headline GFlops are ~28 -> ~1000 (that's for a 1024^2 mesh -- the scaling is 5x for a 256^2).

The main issue seems to be getting things lined up in a way that can take advantage of what are in effect even wider SIMD vectors, with very limited register storage.  The theoretical speeds assume that you can use a lot of simultaneous FMA operations with minimal memory access, and have enough parallel work to hide the latencies in the system.  rt_line_one.cpp seemed like it might potentially be a suitable target, but at least for plane parallel, the opacity doesn't need to be resolved in angle.  It may make some things practical which would be dumb on a CPU (e.g. calculating complicated analytic expressions on the fly rather than caching values).

  Robin

Marios Chatzikos

unread,
Nov 3, 2020, 7:55:30 AM11/3/20
to cloud...@googlegroups.com

Hi Robin,

Thinking more about this, it would be a shame to risk reducing our GPU throughput by a factor of 32 for such little gain.

I went back and implemented your original suggestion for rescaling the cumulative continua, see attached patch.  It produces the same output as the double precision patch, and the test suite runs clean with it.

This seems to be the safer path forward.

Let me know what you think.

Thanks,

Marios

cumul-cont.patch

Robin Williams

unread,
Nov 3, 2020, 9:20:08 AM11/3/20
to cloud...@googlegroups.com
Hi Marios --

It certainly shows that you're not gaining anything significant by going to double storage.

A couple of minor comments:  There's a bit of repetition here, it's possible that the implementation might be a bit simpler if you wrapped the vectors and cumulative factors together in a class which captures the structure that's repeated for all the flux types.  scale_cumulative could then be a method on that class -- perhaps accumulate_flux might be a more accurate name for this?

Best wishes
  Robin

Gary J. Ferland

unread,
Nov 3, 2020, 1:34:33 PM11/3/20
to cloud...@googlegroups.com
I want to passionately vote for the first patch.  

The second patch brings back old memories, old problems, and all the work we did to get rid of it.  Through the 1980s and early 90s, the code had to run on a VAX 780.  Those machines had an exponential limit of e+/-37.  So a large part of the code worked to save scaled quantities so that we could avoid doubles, which were horribly expensive.  All that scaling made the code opaque - you could not tell what various physical parameters were scaled by.  It was hard to maintain and hard to debug.

When modern processors came along we worked very hard to get rid of ALL the float scaling.  Several person-months of work went into it.  The result is a much simpler codebase.  All quantities really have the expected units.  That is easier to maintain and debug.

The first patch is an example of good coding - once promoted to double many of the casts to float were removed.  This is a good thing since it made the code simpler and smaller.  The performance was not injured.  The patch 1 code is an improvement.

I would hate to go back down the road of scaling variables to fit in floats.  The argument seems to be to make it more GPU friendly.  There are no plans that I know of to move to GPUs.  It was an asperation and I did buy an expensive GPU for one of the cloud9s some time ago but I believe nobody ever used it.  If GPU development were to take place, that project would be converting lots of double to floats.  This would involve changes across the entire code base.  Let that project do something like the second patch - they will be doing a lot of it.  I bet it will not happen.

So much effort went into cleaning the code up.  I would hate to see that complexity reintroduced for something we are not now planning to do.  

I want to passionately vote for the first patch.  

thanks,
Gary

Robin Williams

unread,
Nov 3, 2020, 2:52:25 PM11/3/20
to cloud...@googlegroups.com
> All quantities really have the expected units.

rfield.flux[1] isn't necessarily the best example of this, as it doesn't have the same units as rfield.flux[0], and the units change depending on what option the user has chosen.

(For reference, rfield.flux[1][n] += (realnum) rfield.flux[0][n]*cumulative_factor with the dimensions of cumulative_factor often being time -- this additional factor of up to 4e17s is what blows the top on the float type: photons cm^-2 cell^-1 in the age of the universe can get big.)

Part of the point of my comment about wrapping things up in a class was that hiding the scaling behind an interface avoids the headache of mixed units escaping.

As you'll no doubt recall, the GPU was my idea.  Long time ago, I'm sorry that I never made use of it.  At the time, it seemed like things might move in that direction faster than they did (or, as with the move to C++, that it might lure a CS student to the project).  There was also some miscommunication -- I'd not bargained that if you buy a workstation from Dell, they'll fit a high-end part rather than the commodity item which might have been more appropriate for exploratory work (which I ended up doing with simpler code, for the reasons you allude to).  Certainly there has been some stuff done in the interim, but none of the GPU frameworks really seem "turnkey" yet -- nVidia backed away from really supporting the more portable option.  Scanning references, I noticed https://arxiv.org/abs/2009.12609, which may be worth a look with regard to RT.

  Robin

Gary J. Ferland

unread,
Nov 3, 2020, 4:13:34 PM11/3/20
to cloud...@googlegroups.com
my recollection of the parallel processing work is that there was not a "long pole in the tent" to go after.  Just lots of small jobs that had to be converged.

GPUs were the next big thing a decade ago, sort of like machine learning today. 

one big advance that might be worth working on is SMP.  The mac I am typing on is a year old and has six cores.  The Dell laptop running Ubuntu a yard away has ten cores.  They do not scale like XEONs - an individual model in the test suite, running ten-way, takes about twice as long as it would by itself.  But all those cores do add potential.  But then we get back to the question of what to go after.  I remember a comment you made long ago that you parallelized a loop so that was sending off ions in an iso sequence onto different cores, and things got slower.  The memory bandwidth being the problem.

Robin Williams

unread,
Nov 3, 2020, 4:43:15 PM11/3/20
to cloud...@googlegroups.com
I think that one of the reasons ML is big now is that it's the killer app that GPUs have finally found (see TensorFlow).

SMP parallelization is now part of the C++ standard, which may make doing something with that more timely.  The way Cloudy works is tied to the radiation sweep in a way which makes it difficult to domain decompose.  Doing multiple ions in parallel might be an option, but you'd end up with lumpy utilization, with intervening code still serial (something I happened across in another context mentioned the phrase "passing like a pig through a python").  If you can do a grid -- or a test suite -- the existing code will get better utilization.  I can't remember if atomic physics data can be shared between tasks, if not that might reduce the bandwidth requirement a bit.

I bought a copy of Anthony Williams' book "C++ Concurrency in Action", which seems like an excellent introduction to techniques and pitfalls (no relation, no commission; I think you can get enough of a view to get a flavour on the publisher website).

  Robin

Gary J. Ferland

unread,
Nov 3, 2020, 5:48:38 PM11/3/20
to cloud...@googlegroups.com
the idea I mentioned some time ago would be to load each species into its own GPU core.  That would take time with lots of atomic data per species and about 300 species.  Then have those cores be workers - ask "what are the level populations given this temperature and density".  The problem is that species evaluation is not a big timesink.  it is not clear what to go after.

the parallel grid runs solve need this to a large extent.  

Robin Williams

unread,
Nov 3, 2020, 6:03:47 PM11/3/20
to cloud...@googlegroups.com
Yes, speeding up one-off serial runs would likely be a lot of work and significant disruption, with limited payoff.  If Marios does get to RT, that may be a good way of generating a long pole.

Back to the original question, a variant on Marios' second piece of code might be to store the averaged flux in rfield.flux[1], and the total "duration" as a standalone variable.  The update is similar (if I've got the math right)
  mean_{i+1} = mean_i  + dt * (sample_i - mean_i ) / (t_i + dt)
  t_{i+1} = t_i + dt
You'd only need one extra scaling variable with this version.  It would be better to bring out the dt/(t_i+dt) factor to save on (expensive) divisions.

Then both parts of rfield.flux will be in the same units...
  Robin

Marios Chatzikos

unread,
Nov 4, 2020, 9:50:20 PM11/4/20
to cloud...@googlegroups.com

Hi Robin,

Here's another patch, implementing your suggestion.  It defines a new class, called Spectrum, which has the scale_cumulative code as a method (accumulate_flux).  Vectors of the t_rfield structure are now Spectrum objects.  The class overloads the [] operator to maintain existing calls to the data structures.  Some other supporting functionality has been added.

Let me know what you think.

Marios

Marios Chatzikos

unread,
Nov 4, 2020, 9:51:41 PM11/4/20
to cloud...@googlegroups.com

Clicked send too soon...  Attached please find the patch.

cumul-cont-class.patch

Marios Chatzikos

unread,
Nov 4, 2020, 9:57:18 PM11/4/20
to cloud...@googlegroups.com

Hi Robin,

I'm not sure that would work.The cumulative factor may also involve the mass, so that'd have to be factored in.  But I don't think we'd gain anything.  We'd just make the code more complex.

Marios

PS: Your code snippet requires a division by n...

Marios Chatzikos

unread,
Nov 5, 2020, 7:41:43 AM11/5/20
to cloud...@googlegroups.com

I forgot to mention that the test suite passes clean with the patch.  Also, for a cooling simulation including the CMB, it produces the same output as "patch #2", sent a few days ago.

Thanks,

Marios

Robin Williams

unread,
Nov 5, 2020, 4:07:22 PM11/5/20
to cloud...@googlegroups.com
> Your code snippet requires a division by n...

No, but I could have clarified that I was calculating the time-weighted mean (which is a useful value), not the arithmetic mean (which isn't, as it depends on an essentially arbitrary choice of timestep).

This can be demonstrated by induction.  The value after n summations is:
  M_n = sum_n(m_i dt_i) / sum_n(dt_i)
Now, with the recurrence given
 M_{n+1} = M_i  + dt_i * (sample_n - M_n ) / (t_i + dt_i)
multiply through by t_i+dt_i
 M_{n+1} * (t+i + dt_i) = M_i * (t_i + dt_i)  + dt_i * (sample_n - M_n ) = M_i * t_i + sample_n * dt_i
which is the update for the weighted mean.  Of course, you do need the trivial starting value
 M_1 = sample_1

  Robin

Marios Chatzikos

unread,
Nov 6, 2020, 1:42:27 PM11/6/20
to cloud...@googlegroups.com

Hi Robin,

I see what you mean.  I was able to reproduce your equation, and implemented it in the attached patch.

As it stands, the time-averaged flux is computed over the entire spectrum.  I could imagine a command that let the user specify a wavelength range over which to compute it.  But that'd go beyond the scope of this patch, and should probably be developed and committed after this one is committed.

Thanks,

Marios

cumul-cont-class2.patch

Robin Williams

unread,
Nov 6, 2020, 3:33:59 PM11/6/20
to cloud...@googlegroups.com
Hi Marios --

Sorry, this seems to be taking a while...  You can do the same job for the flux[1] elements, so they scale with total_time rather than the flux mean over the spectrum, which would I think be simpler than what you have implemented at the moment.

If all the flux[0]'s are within bounds, the flux[1]s can then never escape.

Yes, controlling what is totalled to the range of interest would be a useful longer-term approach -- there are other things such as line fluxes which might also be good to be able to accumulate.

Best wishes
  Robin

Marios Chatzikos

unread,
Nov 6, 2020, 4:31:44 PM11/6/20
to 'Robin Williams' via cloudy-dev
Hi Robin,

The cumulative factor may be dt, but it may also be dt / mass (see iter_starend.cpp:1115-1128).  I could divide by the elapsed time, and that'd have meaning when the factor is dt.  But in the latter case, the division by mass would effectively be stored in flux[1], which would boost its values by a factor of at least 1e24.  It is still within the limits of a float, but for low enough densities and a bright enough continuum, we'd be getting near the limit of what a float can hold.  It seems to me that the simple arithmetic normalization is preferable in this case.

Let me know what you think,
Marios

Marios Chatzikos

unread,
Dec 3, 2020, 5:15:00 PM12/3/20
to cloud...@googlegroups.com

Hello All,

Now that the migration to the new nublado.org is almost complete, I'd like to commit this patch (reattached) to the newdyna branch.  Please let me know if you have any comments, concerns, or objections.

Thanks,

Marios

cumul-cont-class2.patch

mchatzikos

unread,
Dec 7, 2020, 4:55:02 PM12/7/20
to cloudy-dev
Hello,

The patch has been committed at

If you have any comments, please let me know.

Thanks,
Marios

Gary J. Ferland

unread,
Dec 7, 2020, 5:02:13 PM12/7/20
to cloud...@googlegroups.com
not sure how to read the nice diff file GitLab links to - from the log it appears this was committed to newdyna?

Marios Chatzikos

unread,
Dec 7, 2020, 5:05:16 PM12/7/20
to cloud...@googlegroups.com

Yes, it was.  We have agreed not to commit significant changes to master.

Reply all
Reply to author
Forward
0 new messages