Multiple Interaction term

169 views
Skip to first unread message

Bassanio

unread,
Nov 24, 2021, 4:22:30 AM11/24/21
to plink2-users
Hi,

I have 2 groups(GID) namely A & B and each group has two sub groups(Condition) Control and treatment. Is there a way I can use the both as interaction term.

 Y = b0 + b1.ADD + b2.COV1 + b3.COV2 + b4.ADDxCOV1+COV2 + e

I was thinking of creating a new column which will be a binary of 1,2,3,4 as combination for each group+condition.


Christopher Chang

unread,
Nov 24, 2021, 12:19:04 PM11/24/21
to plink2-users
Add a (COV1+COV2) column to the covariate file, then use --parameters with --glm/--linear/--logistic's 'interaction' modifier.

Bassanio

unread,
Nov 26, 2021, 12:15:18 PM11/26/21
to plink2-users
Hi,

Thank you for your response. Just like to confirm my understanding to your suggestion. So I add a new column as covariate and have its value as below
A+Treatment
A+Control
B+Treatment
B+Control


Thanks in advance
Message has been deleted

Bassanio

unread,
Dec 6, 2021, 12:54:55 AM12/6/21
to plink2-users

 I am getting error while doing so as described above

plink2 --bfile Analysis  --covar Covariate.txt --linear interaction dominant --no-sex --pheno TestPheno.txt --out  Test --parameters 1-6,11 --no-pheno

Cov

FID IID Age Sex Ethnicity Infections_Status Combined

TAM501 TAM501 8.5 1 M V1 M+V1



258 samples (0 females, 0 males, 258 ambiguous; 258 founders) loaded from

Analysis.fam.

1642225 variants loaded from Analysis.bim.

3 quantitative phenotypes loaded.

5 covariates loaded from Covariate.txt.

Calculating allele frequencies... done.

Error: --parameters/--tests cannot currently be used directly with categorical

covariates; expand them into binary covariates with --split-cat-pheno first.

Christopher Chang

unread,
Dec 7, 2021, 11:02:08 AM12/7/21
to plink2-users
Did you read the error message?
Message has been deleted

Bassanio

unread,
May 9, 2022, 6:32:46 AM5/9/22
to plink2-users
Hi,

I have an issue with one covariate and I don't understand how to fix the same. Below is the summary & commands I executed. I have Four Categories and number of samples to each is given below

     62 F1

     68 F3

     26 M1

    102 M3


Step1 : run split-cat-pheno

plink2 --bfile  Analysis --covar Covariate2.txt --no-sex --pheno TestPheno.txt --out Split --split-cat-pheno Combined --make-bed


Step2 : reheard the split covariate and manually change the header

awk '{print $1," ",$0}' Split.cov > New_Split.cov


Step3: Run association 

Test 1: Running first 5 Covariates

/scratch/mv83/Software/plink2 --bfile Split --covar New_Split.cov --linear interaction dominant --no-sex --pheno TestPheno.txt --out  Test --parameters 1-6 --no-pheno --threads 2

Output:

#CHROM    POS    ID    REF    ALT    A1    TEST    OBS_CT    BETA    SE    T_STAT    P
1    788538    h3a_37_1_723918_G_A    G    A    A    DOM    257    0.0254854    0.0897423    0.283984    0.776658
1    788538    h3a_37_1_723918_G_A    G    A    A    Age    257    -0.0172797    0.0174057    -0.99276    0.321787
1    788538    h3a_37_1_723918_G_A    G    A    A    Sex    257    0.11168    0.0769768    1.45083    0.148082
1    788538    h3a_37_1_723918_G_A    G    A    A    Combined=M1    257    0.0800373    0.140188    0.570927    0.568562
1    788538    h3a_37_1_723918_G_A    G    A    A    Combined=M3    257    0.0906461    0.0991741    0.91401    0.361592
1    788538    h3a_37_1_723918_G_A    G    A    A    Combined=F3    257    -0.0549291    0.106011    -0.518143    0.604817

Test 2: Running 6 Covariates (Error)


plink2 --bfile Split --covar New_Split.cov --linear interaction dominant --no-sex --pheno TestPheno.txt --out  Test --parameters 1-7 --no-pheno --threads 2


PLINK v2.00a2LM 64-bit Intel (6 Oct 2019)      www.cog-genomics.org/plink/2.0/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to Test.log.
Options in effect:
  --bfile Split
  --covar New_Split.cov
  --glm interaction dominant
  --no-psam-pheno
  --no-sex
  --out Test
  --parameters 1-7
  --pheno TestPheno.txt
  --threads 2

Start time: Mon May  9 13:55:55 2022
257154 MiB RAM detected; reserving 128577 MiB for main workspace.
Using up to 2 compute threads.


258 samples (0 females, 0 males, 258 ambiguous; 258 founders) loaded from

Split.fam.
1642225 variants loaded from Split.bim.
3 quantitative phenotypes loaded.
6 covariates loaded from New_Split.cov.
Calculating allele frequencies... done.
Warning: Skipping --glm regression on phenotype 'PPP1R9B', and other(s) with
identical missingness patterns, since covariate correlation matrix could not be
inverted (VIF_INFINITE). You may want to remove redundant covariates and try
again.

Test 3: Running5 Covariates  with Interaction by skipping the issue Covariate

plink2 --bfile Split --covar New_Split.cov --linear interaction dominant --no-sex --pheno TestPheno.txt --out  Test --parameters 1-6,10-12 --no-pheno --threads 2

1    788538    h3a_37_1_723918_G_A    G    A    A    DOM    257    NA    NA    NA    NA
1    788538    h3a_37_1_723918_G_A    G    A    A    Age    257    NA    NA    NA    NA
1    788538    h3a_37_1_723918_G_A    G    A    A    Sex    257    NA    NA    NA    NA
1    788538    h3a_37_1_723918_G_A    G    A    A    Combined=M1    257    NA    NA    NA    NA
1    788538    h3a_37_1_723918_G_A    G    A    A    Combined=M3    257    NA    NA    NA    NA
1    788538    h3a_37_1_723918_G_A    G    A    A    Combined=F3    257    NA    NA    NA    NA
1    788538    h3a_37_1_723918_G_A    G    A    A    DOMxCombined=M1    257    NA    NA    NA    NA
1    788538    h3a_37_1_723918_G_A    G    A    A    DOMxCombined=M3    257    NA    NA    NA    NA
1    788538    h3a_37_1_723918_G_A    G    A    A    DOMxCombined=F3    257    NA    NA    NA    NA


I have tried using the recent build (PLINK v2.00a3LM 64-bit Intel (3 May 2022)  ) and also tried --max-corr and --vif 999999 but has the same issue


Issues :

1) what's the issue by including the F1 category and how to fix the issue (Test2). 

Christopher Chang

unread,
May 9, 2022, 10:18:03 AM5/9/22
to plink2-users
0. Please use and report results from the 2022 build; you may not have noticed, but it contains a crucial additional "ERRCODE" diagnostic column.
1. From the --split-cat-pheno documentation: "(It is often necessary to omit one category to avoid creating linear dependence between the covariates, which breaks --glm.)"

Bassanio

unread,
May 10, 2022, 3:28:00 AM5/10/22
to plink2-users

Run on the current Version of Plink

./plink2 --bfile Split --covar New_Split.cov --linear interaction dominant --no-sex --pheno TestPheno.txt --out  Test --parameters 1-7 --no-pheno --threads 2 --warning-errcode

PLINK v2.00a3LM 64-bit Intel (3 May 2022)      www.cog-genomics.org/plink/2.0/

(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to Test.log.

Options in effect:

  --bfile Split

  --covar New_Split.cov

  --glm interaction dominant

  --no-psam-pheno

  --no-sex

  --out Test

  --parameters 1-7

  --pheno TestPheno.txt

  --threads 2

  --warning-errcode


Start time: Tue May 10 11:02:29 2022

257154 MiB RAM detected; reserving 128577 MiB for main workspace.

Using up to 2 compute threads.

258 samples (0 females, 0 males, 258 ambiguous; 258 founders) loaded from

Split.fam.

1642225 variants loaded from Split.bim.

3 quantitative phenotypes loaded.

6 covariates loaded from New_Split.cov.

Calculating allele frequencies... done.

Error: Cannot proceed with --glm regression on phenotype 'PPP1R9B', and

other(s) with identical missingness patterns, since covariate correlation

matrix could not be inverted (VIF_INFINITE). You may want to remove redundant

covariates and try again. 

End time: Tue May 10 11:02:29 2022

2) I am little bit confused and did not understand by removing the covariate. 

My original goal was to add multiple covariates in the interaction term and as discussed before I added a new Column called "Combined" which represents the Two column(Condition+Time). Now by using  "omit-last" I will be removing T2 for example,If done so I don't know the effect of T2 right? 

FID IID AGE SEX CONDITION TIME COMBINED
SAM1 SAM1 10 1 Control 1 C1
SAM3 SAM3 11 2 Control 2 C2
SAM2 SAM2 14 1 Treatment 1 T1
SAM4 SAM4 12 2 Treatment 2  T2

Thanks in advance

Christopher Chang

unread,
May 10, 2022, 9:15:24 AM5/10/22
to plink2-users
There appear to be at least two problems here.

1. It doesn't look like you posted lines from the actual covariate file you used, since the example only has 5 covariates, and the log says 6. From your example, it looks like your covariate file has redundant covariates, but I would be able to be more specific if you posted a real header line.

2. Sex has two values (male/female), so why is it necessary to represent it with only one column?  Because if you had one column with male=1 female=0, and another column with female=1 male=0, then the female column would be equal to exactly (1 - <male column value>).  Whenever one predictor column is exactly equal to a linear combination of the other columns (note that plink's linear regression always includes an "intercept" column of all 1s), it is impossible to have a unique solution; given one solution, you can always get an equally good solution by e.g. decreasing the female-column coefficient by 1, increasing the intercept-column coefficient by 1, and decreasing the male-column coefficient by 1 simultaneously.

Similarly, T2 = (1 - <C1> - <C2> - <T1>), so if you have C1/C2/T1 columns you must not have a T2 column.

Yes, this means that the regression report doesn't have a T2 coefficient; just like when regressing on sex you effectively only have a female coefficient, rather than separate male and female coefficients.  You can infer the male effect by negating the female coefficient, and you can infer the T2 effect by negating the sum of the C1, C2, and T1 coefficients.

Bassanio

unread,
May 11, 2022, 9:01:40 AM5/11/22
to plink2-users
Hi, 

Thanks for the  reply. Please find the real header from initial Covariate file and the resultant split-cat-pheno 

My Original Covariate File

FID    IID    Age    Sex    Combined

SAM11    SAM11    8.5    1    M1

SAM12   SAM12    5.9    1    M3

SAM13    SAM13    8.5    2    M1

SAM15    SAM15    7.1    1    M3

SAM17   SAM17    10.5    2    M1


Step1 : run split-cat-pheno

plink2 --bfile  Analysis --covar Covariate2.txt --no-sex --pheno TestPheno.txt --out Split --split-cat-pheno Combined --make-bed

#IID    Age    Sex    Combined=F1    Combined=F3    Combined=M1    Combined=M3

SAM19    12.1    1    1    1    1    2

SAM117    10.1    1    1    2    1    1

SAM133    11.1    2    1    1    1    2

SAM197    8.5    1    1    1    1    2





Christopher Chang

unread,
May 11, 2022, 10:17:10 AM5/11/22
to plink2-users
Ok.  Please reread my previous response, and the --split-cat-pheno documentation, then.
Reply all
Reply to author
Forward
0 new messages