Question about weighting in Parsac

38 views
Skip to first unread message

Jorrit Mesman

unread,
May 10, 2021, 3:35:14 AM5/10/21
to FABM-users
I am using Parsac to calibrate my FABM model that's coupled to GOTM. Multiple variables are calibrated at once (oxygen, chlorphyll, phosphorus, etc.). As these variables have different units (and therefore their order of magnitude may differ) and sometimes different measurement frequencies, I was wondering how Parsac handles the weighting between these variables. I am using the default settings for the objective function (maximising log likelihood of RMSE, if I understand correctly?).

Consider the following examples:
- If one of the variables has (much) more observations (e.g. oxygen high-frequency sampling vs. manual nutrient sampling), will the oxygen fit get more weight in the calibration?
-  If one of the variables has a different order of magnitude, will that affect the weighting between variables (e.g. nitrate in ug/m3 (high numbers), and phosphate in g/m3 (low numbers))?
- If one of the variables is fitted exceptionally bad by the model (e.g. my nitrate is 2 orders of magnitude off), will any improvement in this variable (e.g. nitrate is now only 1 order of magnitude off) be weighted stronger than a comparitively minor improvement in another variable?
- Will a variable that shows strong fluctuations be weighted stronger than a variable that is relatively constant throughout the year (e.g. deep-water oxygen varying between anoxia and full saturation, and total phosphorus)?

I realise that there may not be an "ideal" answer to the question how weighting should be done, but understanding how weighting is applied could help to improve the calibration. I am not experienced in numerical optimisation, so apologies if these questions are obvious. Would there be a place where this is documented?
 

Jorn Bruggeman

unread,
May 10, 2021, 6:05:44 AM5/10/21
to fabm-...@googlegroups.com

Hi Jorrit,

 

Good questions! Not sure if the FABM list is the best place for them though, but I appreciate parsac doesn’t have its own list or forum yet… To clarify to everyone: parsac is a tool for calibration and sensitivity analysis (https://github.com/BoldingBruggeman/parsac, https://doi.org/10.5281/zenodo.4276111) that works particularly well with GOTM-FABM.

 

Without going into too much detail, here are the key principles for the weighting of observations:

 

parsac’s optimisation routine maximizes the (log) likelihood. This combines all model-observations differences (after transformation, if that’s activated), each weighted by the reciprocal of its “standard deviation” (formally, squares of the differences divided by their variance). This standard deviation is currently a constant for each observed variable, it cannot yet depend on time and/or depth. It can be prescribed in your xml configuration file by adding an attribute sd=”<VALUE>” to the observed variable. If it is not prescribed, parsac will instead estimate it from the model-observation differences. In that case, a variable that the model cannot capture well will automatically get a higher sd.

 

So to come back to your questions:

 

  • Yes, an increase in the number of observations will result in a greater contribution of that variable to the likelihood. Thus, parsac will place more importance on getting that variable right, potentially at the expense of other variables. This is not always desirable, in particular when using high frequency observations. The proper solution would likely be to allow for prescribed or estimated autocorrelation, which would mean that observations that that are closer in time get a lower weighting because they observe the same thing. Since this is not supported in parsac at the moment, you may want to subsample the high frequency variables instead, e.g. with a running average.
  • Different order of magnitude are dealt with automatically if parsac estimates the sd. In that case, a change in units (by applying a scale factor) would not make any difference to the outcome of optimization.
  • Strong fluctuations in any variable are often not that well captured by models. In particular, slight time shifts (e.g. lags) between model and data can then have disastrous effects on the “quality of fit” perceived by parsac. The consequence is that their estimated sd will typically be higher than that of a near-constant variable. But as the model-observation differences will be higher too, the variable can still contribute significantly to the likelihood.

 

In general, if you have concerns about these matters, you may want to experiment with prescribing standard deviations yourself and observing the effect (in addition to using subsampling in time and/or depth)

 

Hope this helps!

 

Jorn

--
You received this message because you are subscribed to the Google Groups "FABM-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fabm-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fabm-users/f40e58e8-b5d9-4d2e-ad3b-2366ca2be13fn%40googlegroups.com.

Jorrit Mesman

unread,
May 10, 2021, 6:21:27 AM5/10/21
to FABM-users
Thank you so much Jorn, that clarifies a lot. I already experimented a bit with subsampling observations, and that seemed to work well. It is good to know this, so I can change the calibration setup accordingly. Again, thanks for the help!!

Best,
Jorrit
Reply all
Reply to author
Forward
0 new messages