Question on coding categorical covariates for PLINK2

12 views
Skip to first unread message

junling REN

unread,
Jan 1, 2026, 7:38:06 PMJan 1
to plink2...@googlegroups.com

Hello,

I have a quick question regarding the recommended coding of categorical variables in covariate files for PLINK2 analyses, and I would appreciate your guidance.

Specifically, I would like to confirm the preferred coding for:

  1. Binary covariates (e.g., sex, stroke, COPD, anemia, CKD):

    • Should these be coded as 0/1, with 0 as the reference and 1 as the comparison group?

  2. Multi-level categorical variables (e.g., race/ancestry with four levels: AFR, EUR, HIS, ASN):

    • Is it recommended to represent the four-level race/ancestry variable using dummy (one-hot) encoded covariates in PLINK2, rather than strings (AFR/EUR/…), to ensure correct categorical modeling? or is there another preferred approach?

I want to ensure that these covariates are treated correctly as categorical variables and that the reference groups are handled as intended in the regression model.

Thank you very much for your time and guidance.

Best regards,
Junling

Christopher Chang

unread,
Jan 10, 2026, 5:17:40 PMJan 10
to plink2-users
- It's fine to encode binary covariates as 0/1.  (However, unless you use the --1 flag, binary *phenotypes* are expected to be encoded as case=2, control=1.)
- PLINK 2 --glm automatically converts n-category covariates to n-1 dummy variables.  It's fine to use the string representation unless another program you're using (such as PLINK 1.9...) can't handle it.
Reply all
Reply to author
Forward
0 new messages