At the risk of conjuring up visages from "Love in the Time of Cholera," we need to talk. It is about decay. Motivated strongly by the demands of the cyclus dev community, I have spent the last couple of sleepless nights writting a decay() function for pyne for your reviewing pleasure. Please ses
#614 for the code.
This decayer encapsulates a few ideas I have been kicking around for the past couple of years. Namely,
With Cameron's guidance on what would incite rebellion and what would be acceptable, I now have the following statistics:
- Source file (decay.cpp) at 14 Mb, down from 54 Mb
- Source file at 69k lines, down from 122k lines
- Compile times between 1 - 2 min, down from infinite
- libpyne.so down to 23 Mb, down from 100+ Mb.
- decay.cpp only takes ~2 sec to generate.
Many of these reduction come from taking spontaneous fission decays out. Others come from some programming tricks (nested switched balanced with unreachable functions). Further reduction are possible if we are willing to get rid of obscure problem species, like Es-247 which has 1015 unique non-SF decay chains.
So what do we do? I have not checked this file into the PR. The way I see it we have a few options. We can check the file in, we can generate the file at build time, and we can make it live in a separate lib*.so.
If we don't want to add this file into the repo and want to generate it during build, then we run into a boot strapping problem since this file requires nuc_data and relevant pieces of pyne. If we do add it to the repo, then our repo size goes up by that amount.
A similar trade off is also seen with startup times and whether we stick this in libpyne.so or only load it when needed.
Also what API would folks like to see? Right now I just have a method on Material and a raw composition map function.
Compiler Flags
In case you were wondering, yes, I did play with compiler flags. A lot. I apply these only to the decay.cpp file. Here are
the results:
| GCC | Clang | |
Extra Compile Flags | User Time [s] | System Time [s] | User Time [s] | System Time [s] | |
-O0 -fno-gcse -time -std=c++11 | 82.74 | 1.43 | 64.9636 | 3.3732 | Note that for clang -time was replaced with -ftime-report |
-O0 -fno-gcse -time | 93.94 | 1.42 | 71.1664 | 3.5783 | |
-O0 -time -std=c++11 | 80.41 | 1.15 | 67.6784 | 3.5175 | |
-O0 -time | 83.6 | 1.7 | 65.0206 | 3.5052 | |
-O1 -fno-gcse -time -std=c++11 | 194.57 | 3.56 | 179.5034 | 3.2921 | |
-O1 -time -std=c++11 | 193.49 | 5.73 | 182.0293 | 3.3066 | |
-fno-gcse -time -std=c++11 | 275.35 | 3.42 | 191.71 | 3.0524 | presumably -O2 |
It only gets worse once you go beyond -O0. This is a little non-intuitive that performing global subexpression elimination is faster, but I guess this is because there ultimately less to assemble.
But Does it Work?!
I think that this shows that the method here really works.
Like any good benchmark or code-to-code comparison, this has brought into relief some issues elsewhere in pyne (and possibly in origen) mostly related to branch ratios and half-life data.
From here, I'll be writing up a theory manual entry which hopefully can also become a short paper. But I wanted to get a sense of what the next steps are for this to be included in pyne proper. So please, discuss!
/scopout