Monte Carlo CI's and multiple comparisons

111 views
Skip to first unread message

Sofia Orellana

unread,
Oct 25, 2021, 5:38:50 PM10/25/21
to lavaan

Hi everyone, 

I am seeking to correct my Monte Carlo-derived CI's for indirect paths in lavaan for multiple comparisons.
I am attempting to stay away from applying a Bonferroni-centered solution. This would imply changing my alpha level and thus the level of my CI, but given that I have to correct for over a hundred tests this becomes too stringent. Therefore I am seeking to apply an FDR-analogue solution. 

A proposed solution for a similar problem  (link to source) is to control the CI's for the false coverage rate. But this solution requires me to undergo parameter selection first, by for instance, thresholding the p-values associated with said parameters. 

The problem is, as far as I understand,  I cannot take this 1st step of selecting a number of parameters for whom to compute the CI's as the p-values associated with indirect effects in lavaan are non-informative (due to the sampling distribution associated with indirect parameters being non-normal). Which is why I use the Monte Carlo method to get my CI's in the first place. 

I wonder, the following:
  1. If the False Coverage Rate solution is actually valid in this context
  2.  If there is an alternative to using p-values for selecting the parameters that I will then compute FCR-corrected CI's for. I thought of making my selection equal to my number of parameters, but mathematically this seems to lead to the same alpha level. See equation in Algorithm 2.1 in source [1].
  3. If there is any other method out there for dealing with this issue.

References to the False Coverage Rate solution can be found here:
[1] Rosenblatt, J. D., & Benjamini, Y. (2014). Selective correlations; not voodoo. NeuroImage, 103, 401–410. https://doi.org/10.1016/j.neuroimage.2014.08.023
[2] Benjamini, Y., & Yekutieli, D. (2005). False Discovery Rate–Adjusted Multiple Confidence Intervals for Selected Parameters. Journal of the American Statistical Association, 100(469), 71–81. https://doi.org/10.1198/016214504000001907

Thank you. 

Sofia

Terrence Jorgensen

unread,
Oct 28, 2021, 5:41:12 AM10/28/21
to lavaan
How large is your sample?  For all the fuss people make, the delta method actually works quite well asymptotically.  I hope N is quite large given how large your model is, and if so, I would recommend passing the relevant subset of parameterEstimates() output to p.adjust() to request the FDR correction. 

Alternatively, you could use the method described in the CrossValidated link.  You don't need to base your decision on p values.  If the H_0 = 0 for all candidate parameters, you could rank them in order of absolute value of their (un)standardized estimates.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Sofia Orellana

unread,
Oct 30, 2021, 12:47:02 PM10/30/21
to lavaan
Hi Terrence, 

Thank you for your answer!  My sample size is ~20.000. 

Just to clarify: 
1) Do you mean that, given that my sample is large enough, I do not have to worry about my underlying ab distribution for an indirect effect being non-normal? (As it approximates a normal distribution)
2) If yes to the above: I could then use the p-values that lavaan gives me for the indirect effects? (and just FDR correct them afterwards).


Concerning your second suggestion:
Thank you for clarifying that I don't have to base my selection on p-values.
If I were to rank my candidate parameters by the absolute value of their unstandardized estimates it is not clear to me what is a cut-off that I can then use to select a given (high-ranking) subset of them, to then compute CI's for with the method in the link. My question is, what should I use as a cut-off here?

Thank you!

Best, Sofia

Pat Malone

unread,
Oct 31, 2021, 1:37:54 PM10/31/21
to lav...@googlegroups.com
Sofia,

Jumping in for Terrence:

On the first section, yes and yes. With N=20,000, in most research contexts, anything that is practically significant will be statistically significant. And with that power, the decrement from using the a*b p-value instead of an asymmetric CI will be so small as to be meaningless. So you can do the easy thing and be fine.

Pat

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/d249ab44-23af-4ff8-a5fa-5e6e27583d24n%40googlegroups.com.


--
Patrick S. Malone, PhD
Sr Research Statistician, FAR HARBΦR
This message may contain confidential information; if you are not the intended recipient please notify the sender and delete the message.

Sofia Orellana

unread,
Nov 2, 2021, 7:14:05 AM11/2/21
to lavaan
Dear Pat, 

Thank you so much. That is very clear. 
Just one more question out of caution: would this still hold for more complex indirect paths? (e.g. a*b*c)

Best, 
Sofia

Pat Malone

unread,
Nov 2, 2021, 8:52:32 AM11/2/21
to lav...@googlegroups.com
It should, I would think. Delta method SEs are technically incorrect, but the practical effect versus an asymmetric CI  is usually pretty small in  application. At your sample size, unless you're working with something like an important but extremely lopsided binary variable, the difference isn't going to be noticeable when practical significance is also considered.

Stas Kolenikov

unread,
Nov 2, 2021, 12:07:27 PM11/2/21
to lav...@googlegroups.com
Hold on, hold on -- why would the delta method SEs be incorrect, Pat? The stuff that is closest to asymptotic normality are the covariance moments themselves; the estimates are "first order removed" in terms of nonlinearity / solving something nonlinear to get the quantities of interest -- nonlinearity increases the sample size requirements for what one could consider approximately normal (skewnes goes to zero, kurtosis goes to that of the normal) -- combining stuff into indirect estimates adds another level of nonlinearity so these combinations are converging to normal even slower, but they still do.

My somewhat greater concern for Sofia would have been that she might be using complex survey data... in which case the degrees of freedom / effective sample size / how far you are into the land of asymptotia will be limited by the number of clusters / primary sampling units.

-- Stas Kolenikov, PhD, PStat (ASA, SSC)  @StatStas
-- Principal Scientist, Abt Associates @AbtDataScience
-- Opinions stated in this email are mine only, and do not reflect the position of my employer
-- http://stas.kolenikov.name
 


Shu Fai Cheung

unread,
Nov 3, 2021, 8:19:54 PM11/3/21
to lavaan
In mediation, which involves sample a*b, there is a special case that may rarely happen in real life but is included in many simulation studies: the population values of both a and b are zero. Even if the sampling distributions of both a and b are normal and they are uncorrelated, the sampling distribution of a*b is not normal:

Craig, C. C. (1936). On the frequency function of xy. The Annals of Mathematical Statistics, 7(1), 1–15. https://doi.org/10.1214/aoms/1177732541

The distribution of the product will not approach normal, no matter how large the sample size is, because the expected values of the two random variables are zero (Craig, 1936, p. 3, the 3rd equation).

But to be honest, this case is of little practical significance in real data with large sample sizes. Share just because it is a theoretically interesting case.

-- Shu Fai

Stas Kolenikov

unread,
Nov 4, 2021, 12:37:06 PM11/4/21
to lav...@googlegroups.com
Shu,

that would make sense -- the second derivatives of something something scores must be zeroes. Testing for that should be done in the special way. But if asymptotics is screwed up / not normal / not O(n^{-1/2}), then I would not be convinced that bootstrapping takes care of that in any meaningful way.

-- Stas Kolenikov, PhD, PStat (ASA, SSC)  @StatStas
-- Principal Scientist, Abt Associates @AbtDataScience
-- Opinions stated in this email are mine only, and do not reflect the position of my employer
-- http://stas.kolenikov.name
 

Pat Malone

unread,
Nov 14, 2021, 6:01:51 PM11/14/21
to lav...@googlegroups.com
Stas,

With apologies for the delay: I was admittedly using a sloppy shorthand. There is nothing wrong with the delta method for SEs. I was meaning to address what Shu Fai answered: That in problems of inference about the product of coefficients, a method based on (point estimate/SE) or (point estimate +/- SE) will not, as far as I know, reflect the sampling distribution of the product outside corner cases. But as Shu Fai and I both said, with such a large sample, the difference in approaches is quite unlikely to affect conclusions in any applied question.

Pat

Reply all
Reply to author
Forward
0 new messages