Hello,
I have a quick question regarding the recommended coding of categorical variables in covariate files for PLINK2 analyses, and I would appreciate your guidance.
Specifically, I would like to confirm the preferred coding for:
Binary covariates (e.g., sex, stroke, COPD, anemia, CKD):
Should these be coded as 0/1, with 0 as the reference and 1 as the comparison group?
Multi-level categorical variables (e.g., race/ancestry with four levels: AFR, EUR, HIS, ASN):
Is it recommended to represent the four-level race/ancestry variable using dummy (one-hot) encoded covariates in PLINK2, rather than strings (AFR/EUR/…), to ensure correct categorical modeling? or is there another preferred approach?
I want to ensure that these covariates are treated correctly as categorical variables and that the reference groups are handled as intended in the regression model.
Thank you very much for your time and guidance.
Best regards,
Junling