1. Make sure to use PLINK 2.0 instead of PLINK 1.9 for this. The relative speedup often exceeds 100x, and sometimes even 1000x. (There is no need to use GNU parallel with PLINK 2.0 --glm, PLINK 2 will do its own multithreading.)
2. If some of your quantitative phenotypes have no missing values, you should process them in a single PLINK 2 run. There is a multi-phenotype optimization that can provide up to a ~10x speed multiplier.
3. For case/control phenotypes, the 'cc-residualize' modifier (which performs a single regression on the covariates and applies the resulting offsets to the logistic regressions for all variants, instead of re-fitting the covariates in every single logistic regression) speeds up the calculation substantially at a fairly small accuracy cost.