getting 1 sigma uncertainties

8 views
Skip to first unread message

Chris

unread,
Oct 6, 2023, 1:36:49 PM10/6/23
to PEUQSE-users
Hey Ashi!
I have a couple of questions concerning postprocessing in peuqse, if you have a moment.
  1. Currently, we have generated a large number of samples using Ensemble slice sampling, and we appears to have pretty good agreement between the map and mu_ap. for all of our parameters. we are able to generate contour plots for covariance between parameters, but I was wondering what the best way to get just the +/- values for the 1 sigma interval would be (like the values for Ea and logA in table 2 of the chekipeuq paper).
  2. Is forward propagation of uncertainties possible in peuqse? We've generated the map and hpd for each of out model parameters, but is it possible to get the uncertainty for our model outputs as well? I may be misinterpreting the paper results, but table 3 in the chekipeuq paper appears to have the uncertainties for the peak temperature, Wasn't sure if this was simply propagated through the redhead peak equation or if there was some other method used. Our model is a complex reaction system so we were considering using something like UQTK to propagate the errors uncovered by peuqse to the model outputs.


Aditya Ashi Savara

unread,
Oct 6, 2023, 10:58:32 PM10/6/23
to PEUQSE-users
1) For the uncertainties, I've used 1 sigma or 2 sigma (and reported as such) from the stdap that is in the log files (it can also be pulled out from the PE_object).   For example, in this file, https://github.com/AdityaSavara/PEUQSE/blob/master/Examples/Example00/logs_and_csvs/mcmc_log_file.txt   we could use  [167.49588332,  503.60518447]  +/- [ 51.98513273 176.02359081]
2) There are a few ways to get prediction uncertainties.
2a)  One is that if you look in the UserInput file, you can turn on exporting all of the simulation outputs (for the entire run):
parameter_estimation_settings['exportAllSimulatedOutputs'] = False  <-- can be set to True.  Then you can get statistics of the posterior.
2b) Another way is that you could sample the posterior (from the csv file that's generated). For example, if you have 1,000,000 points, you could sample 1,000 of them and run the simulation to get the outputs [without PEUQSE].

We actually have an issue card on plans to do this by default, but it's not implemented: https://github.com/AdityaSavara/PEUQSE/issues/206
We also had a similar intent in the pull request that hasn't been merged:

But that pull request had some issues with it, and Troy had to finish the project he was working on, so we unfortunately didn't get to the point of fixing it (and I have a large number of other things that I need to prioritize over that not-so-simple merge).


--
You received this message because you are subscribed to the Google Groups "PEUQSE-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to PEUQSE-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/PEUQSE-users/2c0d3b82-6cef-4fd9-9675-582a9c1dcdb0n%40googlegroups.com.

Chris

unread,
Oct 7, 2023, 5:01:04 PM10/7/23
to PEUQSE-users
Thanks for the reply! The answer to 1 makes sense, I feel silly for not knowing that's what the stdap was in the log file. For 2, we are saving the logp, parameter values and simulation results from every run. I hope this isn't a silly question but I am still not quite sure what we'd be doing. We have, say, 1,000,000 different data points for methane conversion, which we have gotten via a million combinations of parameters from when we were exploring the hpd region. so our mean would be the mu_ap value. Would the +/- sigma  just be based off of the distribution of methane conversions points from the other 999,999 runs? I guess I should say, the way I am saving points is on my fork of peuqse, and it is not discriminating between post burn-in/pre-burn in samples, it just saves every single logp, parameter set, and simulation output set.

I have tried setting  'exportAllSimulatedOutputs' to true, but I think for the ensemble slice sampling it might not work. Maybe, if you're strapped for time, you could just point me to a specific function/routine in peuqse where it calculates the statistics for the posterior, and I can post my findings back to this board. I'll take a look at Troy's pr as well.

I will hopefully have a few contributions/prs for peuqse by the time I am done with this project, so far I have just been making tweaks locally on my fork. I appreciate all the help.

Aditya Ashi Savara

unread,
Oct 9, 2023, 6:58:30 PM10/9/23
to PEUQSE-users
You may be right that the exportAllSimulated values does not work for ESS -- you could take a look for what it does in the code, if you wish, and even potentially upgrade the code. (I'm not sure how easy that would be).

Regarding the outputs: The outputted samples you see in the log-files are post-burn in. I should have made that more clear in the documentation somewhere.
I'm not sure what this sentence means: " I guess I should say, the way I am saving points is on my fork of peuqse, and it is not discriminating between post burn-in/pre-burn in samples, it just saves every single logp, parameter set, and simulation output set."
The PEUQSE log file is only post-burn in. Even if you were to turn on to save the burn-in, it would save it in a different file.  So if you are saving all of the simulated outputs, I assume you are saving them with some kind of index (like 1, 2, 3, .... 1000). The first ones simulated will be burn in, and then the last ones will be post-burn in. But, if you have any reason to doubt that ordering and have the parameters, the regular PEUQSE csv files are (unless stated otherwise) post-burn in.
I'm not sure if I've missed anything in the questions. Have I?


Aditya Ashi Savara

unread,
Oct 9, 2023, 7:03:22 PM10/9/23
to PEUQSE-users
It looks like exportAllSimulated relies on a variable called post_burn_in_samples_simulatedOutputs which is never actually used anywhere. So it looks like I never got around to implementing that feature.  If you did want to implement it, the way I would implement it today would be to (1) initialize an array of the size expected, (2) fill that array when calculating logP, during the getting of the simulated responses.  But there maybe some nuances due to filtering of outputs etc. I thought I had implemented it, at one point, but it may be that we had implemented it on that fork of Troy's, or maybe I had to get rid of the feature during some upgrade and never fixed it.

Chris

unread,
Oct 9, 2023, 7:10:12 PM10/9/23
to PEUQSE-users
What I meant is I added code to just save the logp, input parameters, and output values, for everytime peuqse calls getlogp. That is all. I am still unsure how to get 1 sigma uncertainties for the simulated output but I will just look through the examples and see if I can figure it out.

Aditya Ashi Savara

unread,
Oct 9, 2023, 10:58:45 PM10/9/23
to PEUQSE-users
Ah, I see. To get 1 sigma uncertainties, just take the standard deviation of your post-burn in output for that observable's simulated value. If you're doing an MCMC run, the sampling is proportional to probability. So the standard deviation of the simulated post-burn in outputs is in fact the standard deviation of the observable (as associated with the posterior).

Chris

unread,
Oct 10, 2023, 12:20:09 PM10/10/23
to PEUQSE-users
I see! I did not realize that they were proportional. Thank you for the responses, especially on a holiday weekend! I will see if I can figure out how to get exportAllSimulated to work for ESS, but for now I have all of the post burn in parameter values so that is enough for me to get the simulated responses I need.

Aditya Ashi Savara

unread,
Oct 11, 2023, 10:09:13 AM10/11/23
to PEUQSE-users
No need to worry about getting that to work for the ess case.  The simplest thing is for us to simply grab responses as they are made in the function that does the simulations and then store them in a pre-made array. That way of doing things won't work for some types of parallezation, but we can put both the parameters and the observables in a single table that gets exported  maybe with a blank column in between. Then it will still work for some of the types of parallelization.

Chris

unread,
Oct 11, 2023, 10:45:56 AM10/11/23
to PEUQSE-users
I am currently saving them to a pickle file, there is one for each mpi runner because your pickle file writer is programmed to accommodate that, it just saves it as "gsa_ouput_<mpitask#>.pkl". They are just combined at the end externally. (permalink: https://github.com/ChrisBNEU/PEUQSE/blob/6da57171434bc3617a7543df7b19eaae570a46e3/PEUQSE/InverseProblem.py#L1655). It is pretty ugly but it works. I think I should make a pr to actually use the existing framework, instead of this janky workaround, but I can make a pr and we can have that discussion on github.
Reply all
Reply to author
Forward
0 new messages