How to setup input file for PEUQUSE

59 views
Skip to first unread message

Chris

unread,
Feb 16, 2023, 2:30:11 PM2/16/23
to PEUQSE-users
Hello,
I am trying to figure out how best to setup my input file. I have two parameters that I want to fit, A and Ea (mech_data in code snippet below). The simulation is compared to experimental data. My inputs (x_data)  are: 
  1. Experimental temp,
  2. pressure,
  3. catalyst area,
  4. mole fractions in,
  5. volume flow rates
My outputs are just my outlet mole fractions (y_data). So the main inputs look like this (see run_peuquse.py script, attached):
```
    UserInput.responses['responses_abscissa'] = x_data
    UserInput.responses['responses_observed'] = y_data
    UserInput.model['InputParameterPriorValues'] = mech_data
```
x_data is a multilevel list of [[Temp1,temp2,...], [pressure1, pressure2, ...], ...]. same with y_data. mech_data is just my [A, Ea] values in a single level list. I was wondering if this was correct. It appears to run with the input file I've attached, I just don't know if it is actually doing anything useful.

run_peuqse.py
ct_simulation.py
expt_data.yaml
expt_unc.yaml
ct_initial_small.yaml

Aditya Savara

unread,
Feb 17, 2023, 4:02:59 AM2/17/23
to PEUQSE-users
An MCMC length of 10 is too small to be useful. You could use a do optimize log P though, or you could use something like 10000 * ( number of parameters).  Since you haven't provided the outputs, it's hard for anyone to know if it's doing anything useful -- but maybe there are no outputs due to the small mcmc.  Maybe you could start with a doSinglePoint and then do a doOptimizeLogP before trying to do an mcmc.

For your nested abcissa, I think you would need to use [[Temp1, pressure1, ...], [Temp2, pressure2,...]  ]. I'm not sure. I'd have to find something in the examples or dig into the code. But, if I am not mistaken this also doesn't affect anything for PEUQSE's solving, it is really just for plotting purposes.  You could also just comment out that line to see what PEUQSE does when not provided it.  I am traveling right now and also my to-do list is very backed up, but i looked at example 7j and it leads me to think I'm correct that you'd have to parallelize your abscissa as I described. Because I think it's matching the abscissa elements to the response elements, thus how you described it is transposed to what it should be.

If you try some of the above and provide zipfles for a particular directory (like one for a doSingleResponse attempt, one for a doOptimizeLogP attempt, one for a longer mcmc attempt like 1000 or 10000) we could see if the outputs seem useful in any way.  You're probably also at a stage (or near the stage) to read or re-read the tips at the end of the file ExamplesAndTutorialAndGettingStarted.rtf  since now you've got (or soon will have) a working example for your problem.

Chris

unread,
Mar 27, 2023, 12:12:36 PM3/27/23
to PEUQSE-users
Hey!
I know it's been a month, I took a little bit of time to work on the issues I had above and read a little more. Thanks for the help! I was able to run that example with a larger MCMC length and got a reasonable answer for the parameters.

Moving beyond the example above, my overall goal using peuqse is to make a model that can get a MAP for all the kinetics in my model. I have this model attached, it is a little complex, but the gist of it the parameters are the A, E0, and alpha for a BEP relationship, with all of the reactions being grouped into families. It also varies the binding energies for species within the model according to linear scaling (i.e. the atomic binding energies for Carbon, Hydrogen, etc. are varied).  There are 110 parameters overall. Currently I am having an issue that I think is making my runs less efficient, I was wondering how to fix it.

When I do a run, the initial set of parameters is fine, since it uses my initial values specified in the input file. Subsequent steps, however, seem to use values that are outside of where I want to be guessing (e.g., it is guessing activation energies that are below 0). I tried to rectify this by setting the "InputParameterPriorValues_upperBounds" and lowerBounds to values that I thought would be acceptable. This didn't work, and I was still seeing some values set to less than my lower bounds. I then realized that in the "UserInput" file, there was a note saying the upper and lower bounds had only been tested with the 'scaling_uncertainties_type' parameter set to 'off'. I tried that, but it gave me the following error:

Exception has occurred: LinAlgError
When `allow_singular is False`, the input matrix must be symmetric positive definite.
  File "/Users/blais.ch/Documents/_01_code/05_Project_repos_Github/PEUQSE/PEUQSE/InverseProblem.py", line 2766, in getLogPrior
    logPrior = multivariate_normal.logpdf(x=discreteParameterVector_scaled,mean=self.UserInput.mu_prior_scaled,cov=self.UserInput.covmat_prior_scaled)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/blais.ch/Documents/_01_code/05_Project_repos_Github/PEUQSE/PEUQSE/InverseProblem.py", line 1648, in getLogP
    log_prior_proposal = self.getLogPrior(proposal_sample)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/blais.ch/Documents/_01_code/05_Project_repos_Github/PEUQSE/PEUQSE/InverseProblem.py", line 2618, in doMetropolisHastings
    log_posteriors_un_normed_vec[0]= self.getLogP(samples[0])
                                     ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/blais.ch/Documents/_01_code/05_Project_repos_Github/minimal_examples/peuqse_02/run_peuqse.py", line 88, in <module>
    PE_object.doMetropolisHastings()
numpy.linalg.LinAlgError: When `allow_singular is False`, the input matrix must be symmetric positive definite.

I am a little stuck with this one. I added a self contained example (although it does require you to have cantera installed). I also checked my input parameter ranges, uncertainties, and values and they seem reasonable. I think looking at the code, it is ok if it guesses a value that is outside the bounds, because it will set the log p for that guess to infinity. It still runs the simulation though, which eats up time on guesses that are physically meaningless.

Hopefully that all made sense, I am happy to clarify anything. Thanks for all the help!
peuqse_02.zip

Aditya Ashi Savara

unread,
Mar 27, 2023, 10:30:28 PM3/27/23
to PEUQSE-users
I replied to Chris privately but am replying to him here also so that the answer is public:I don't know why there is an error in one of the evaluations in this specific case and won't have time to look to figure it out at this time, but I gave him the following work around. What he can do is modify his wrapper function (the one that is calling the simulation function), to first check his parameter constraints and then return "None" when they are not met (without even calling the simulation function).  When PEUQSE receives a None object (rather than a vector of simulated response values), then PEUQSE treats that parameter set as a zero probability point. Chris indicated that work around should work for him, so he will try it.

--
You received this message because you are subscribed to the Google Groups "PEUQSE-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to PEUQSE-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/PEUQSE-users/2fafe8c1-790d-40b8-bb0f-d726b3369f93n%40googlegroups.com.
Message has been deleted

Aditya Ashi Savara

unread,
Apr 10, 2023, 10:29:55 AM4/10/23
to PEUQSE-users
First, thanks considering making a contribution, and this is a good issue to have a solution for -- whether it's by more documentation /examples or more code.
There are several separate issues involved in this question, so let's consider them...
a) Definitely you could make your code put out its own outputs, let's not rule that out yet, but consider if there is something better.
b)  This is a general problem, that the scipy optimizers don't give you enough information along the way. Accordingly, we do have a partial solution for this. In InverseProblem.py, there is a function called verbose_optimization_wrapper due to this kind of issue, but I think it might only work with some of the optimizer choices, like Nelder-Meade. I don't remember. You may want to do some singlepoint calls to doOptimize logP playing around with it. Note that this requires a True call to doOptimizeLogP:
def doOptimizeNegLogP(self, simulationFunctionAdditionalArgs = (), method = None, optimizationAdditionalArgs = {}, printOptimum = True, verbose=True, maxiter=0):
c) My recollection is that this is different from the multi_start verbose. In the UserINput I see this:
parameter_estimation_settings['multistart_passThroughArgs'] = {} #Typically, one would put here the arguments for doOptimizeNegLogP: {'method':"Nelder-Mead", 'printOptimum':True, 'verbose':False} To find additional details of which arguments can be used with doOptimizeNegLogP, see the function doOptimizeNegLogP in PEUQSE\InverseProblem.py
So I think you need to use the passthrough args.
d) When I looked at verbose_optimization_wrapper, it looks like it has no way of writing the outputs to file. Maybe it would be a good idea to change that somehow.
e) Is your problem even feasible within a 24 hour time limit, with a doOptimize call? Each doOptimize point probably needs 100 sequential iterations. For some optimizers, you can limit that with passthrough args.  Otherwise, you might need to do something else like short ensemble jump sampling runs or maybe a sobol or astroidal sampling, since the astroidal sampling can be perfectly parallelized.
d) There is an MPI parallelization of PEUQSE feature, which can be used with multistart. It looks like the syntax for that is here, https://github.com/AdityaSavara/PEUQSE/blob/master/Examples/Example26/runfile_Example_26d2_BPE_MH_multistart_with_Parallel.py  but I probably should make an example of parallel sampling with sobol or astroidal.  <-- the MPI feature is not always super easy to get working, so I have taken the approach that it only makes sense to try to get working if a person needs it. Or, if someone really plans to work with PEUQSE a lot or for some huge and important problem.




On Mon, Apr 10, 2023 at 9:56 AM Chris <chris...@gmail.com> wrote:
I have one question related to output files. I am currently doing a multistart optimize logP run with my current model, but I am not converging by my hpc's time limit (24 hrs). Should I limit each starting point to a specific number of iterations, or is there a way to generate a checkpoint before the end of the run? Currently I have the following set in my run file, but all of my output folders are empty at the end of the 24 hour period:
```
    UserInput.parameter_estimation_settings['mcmc_threshold_filter_samples'] = True   
    UserInput.parameter_estimation_settings['mcmc_parallel_sampling'] = True
    UserInput.parameter_estimation_settings['mcmc_random_seed'] = 0
    UserInput.parameter_estimation_settings['multistart_searchType'] = 'doOptimizeLogP'
    UserInput.parameter_estimation_settings['multistart_passThroughArgs'] = {'method':'BFGS'} #Here, BFGS is used. However, Nelder-Mead is usually what is recommended.
    UserInput.parameter_estimation_settings['multistart_initialPointsDistributionType'] = 'sobol'

    # things to try to get it to save more:
    UserInput.parameter_estimation_settings['multistart_exportLog'] = True
    UserInput.parameter_estimation_settings['multistart_checkPointFrequency'] = 100
    UserInput.parameter_estimation_settings['verbose'] = True

    PE_object = PEUQSE.parameter_estimation(UserInput)
    PE_object.doMultiStart()
```
I have a workaround where I write my own output files with information on each iteration (parameters, outputs, logp value) but I imagine you probably already coded something that is more efficient than whatever I've implemented. I did take a look in "UserInputs" and it seems like maybe the multistart checkpoint frequency isn't implemented yet, so maybe I could start there and try to code something.

Regards,
Chris Blais

Aditya Ashi Savara

unread,
Apr 10, 2023, 10:54:31 AM4/10/23
to PEUQSE-users
Actually, this is an important point that I should stress: my recollection is that your kinetic model is pretty long to run (meaning, not 1 second).  That means you probably cannot achieve *any* kind of multistart through PEUQSE on a supercomputer unless you use short runs or parallel sampling.

Unless you get the parallel sampling to work, what you probably need to do is use some kind of MCMC (ESS or EJS or MH) with the continue_MCMC [or whatever that feature is called] turned on to true. You can certainly do one or more multistarts with things like random, but the search will be too slow in a high dimensional space unless you use an MCMC in the end. And if you are hitting the wall clock time, then you likely need to turn on the continue sampling of MCMC runs so that it starts off where it left off.  Better do small tests first to make sure the continuing is working.
Reply all
Reply to author
Forward
0 new messages