Running analyses hundreds of times

Hasuni

unread,

May 31, 2025, 9:45:18 AM5/31/25

to plink2-users

Hi there,

I am working on a project that will require running plink2 association several times (>100). This really comes down to some engineering decisions/hacks. I'm curious to see if plink2 has already existing built-in features that might make this easier.

Some questions:

1. I see that there is a --rerun flag. Does this mean if I provide the same file to multiple runs that all the logs will be concatenated to this file?

2. Is there a way to load in a pfile once and run 10 separate analyses on it instead of reloading the pfile everytime I run the analysis?

3. Are there other features in plink2 that I should consider given the information I have shared (sorry for being vague in project scope)

Hasuni

unread,

May 31, 2025, 12:51:02 PM5/31/25

to plink2-users

Sorry, i misunderstood --rerun. This flag will just rerun an analysis using the flags from a provided log file.

So maybe an updated question is: is there a way to avoid generating log files for every run? Is there a way to output to the same log file for all runs?

Chris Chang

unread,

May 31, 2025, 2:19:19 PM5/31/25

to Hasuni, plink2-users

Some general tips:

- plink2 loads the psam and pvar files near the beginning of typical invocations. However, it may not need to load more than the pgen's header; see the documentation on --read-freq/--error-on-freq-calc for a way to avoid the sometimes-expensive "Calculating allele frequencies..." step.
- From the --glm documentation:
"If you have multiple quantitative phenotypes with either no missing values, or missing values for the same samples, analyze them all in a single --glm run! PLINK 2.0's linear regression 'only' tends to be a few hundred times as fast as PLINK 1.9 when you analyze one quantitative phenotype at a time. But --glm also has a quantitative-phenotype-group optimization that can multiply the speedup by another factor of ~10."
- It doesn't come up much, but there is a --loop-cats flag for rerunning the same analysis on a bunch of disjoint categories.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/8b8f2185-56e7-47b1-9704-d697c575097en%40googlegroups.com.

Hasan Alkhairo

unread,

Jun 13, 2025, 1:49:37 PM6/13/25

to Chris Chang, plink2-users

Thanks Chris!!

Also, Im running a glm using a pheno file with hundreds of phenotypes, and running this analysis several times for different groups. I created one giant pheno file that I could pass for all runs but I'm discovering some groups only have controls for some phenotypes which makes plink2 crash leaving several phenotypes unanalyzed.

I want to avoid making several phenotype files that are specific to each groups (I have >20 groups), so is there a way to ignore these errors?

Chris Chang

unread,

Jun 13, 2025, 1:51:39 PM6/13/25

to Hasan Alkhairo, plink2-users

Try —glm’s ‘skip-invalid-pheno’ modifier (

https://www.cog-genomics.org/plink/2.0/assoc#skip_invalid_pheno ).

Hasan Alkhairo

unread,

Jun 24, 2025, 7:16:21 PM6/24/25

to Chris Chang, plink2-users

Worked great! Thank you so much

Reply all

Reply to author

Forward