confusion regarding minor/major/ref/alt alleles in relation to case-control analysis

1,450 views
Skip to first unread message

gab...@research.haifa.ac.il

unread,
Jan 31, 2019, 11:05:59 AM1/31/19
to plink2-users

Hi guys,


I know many people have asked about this subject before, but I didn’t find something that gives me the answer I need.


I’m doing Case-Control analysis using PLINK 1.9.  During the pre-analysis QC (which included “--maf 0.001” filter), I understood that PLINK 1.9 flips the A1/A2 order based on “minor”=A1 (and A1=ALT allele, if I understand correctly).

In about 5% of my data, this is not the case, meaning the ALT allele has the bigger frequency.


I ran the pre-analysis QC (reminder: it included “--maf 0.001” filter) and the Case-Control analysis (using --assoc) twice: First I let PLINK work as usual, with the A1/A2 switching. In the second time I forced A2 as the “reference allele” (not ALT) by using “--a2-allele” flag (I checked, it worked).


Results:

The results of the QC (Number and identity of sample and SNPS) were exactly the same in the 2 analyses.

The results of the Case-Control analyses were also exactly the same.


How can it be? Am I wrong to understand that A1=ALT and A2=REF (and PLINK treats minor allele as the ALT allele)? Is there any effect of the A1/A2 switching on the –assoc function?


Thanks, Gabriel


Christopher Chang

unread,
Jan 31, 2019, 12:28:05 PM1/31/19
to plink2-users
PLINK 1.x tries to set A1 = minor on every single run.  If you use --a2-allele on the QC run, but then leave it out during the --assoc run, the A1 alleles will be forced to minor in the --assoc run.

gab...@research.haifa.ac.il

unread,
Jan 31, 2019, 12:39:59 PM1/31/19
to plink2-users

Hi Christopher, I realized (after several tests) that I need to introduce the --a2-allele every command. my assoc script was: "./plink --bfile Final_bfile_31Jan2019 --a2-allele REF_List_31Jan2019.csv 2 1 --assoc --ci 0.95 --out Final_bfile_31Jan2019_as".

John Jackson

unread,
Dec 3, 2019, 2:39:21 PM12/3/19
to plink2-users
Hi Christopher, 

This is probably obvious, but could you please confirm the following point about A1/ALT/REF alleles? 

In short, this is the PLINK-2 command I ran: 

plink2 \
 
--covar covarfile.txt \
 
--covar-name sex,age,PC1,PC2 \
 
--glm hide-covar cols=chrom,pos,alt1,ref,a1freq,beta,se,p \
 
--mach-r2-filter 0.3 1.0 \
 
--maf 0.01 \
 
--memory 64000 \
 
--out outfile \
 
--pfile genotypefileset \
 
--pheno phenofile.txt \
 
--pheno-name phenotype

And this is a small chunk of the output: 

#CHROM  POS     ID      REF     ALT1    A1      A1_FREQ BETA    SE      P
1       1  rs1     T       C       T       0.106321        0.0125737       0.0219298       0.56641
1       2  rs2     T       C       T       0.0177909       -0.00393126     0.0592781       0.947125
1       3  rs3     C       T       T       0.230796        -0.0195085      0.0212053       0.357598
1       4  rs4     C       T       T       0.201837        -0.0193924      0.0221503       0.381323


Namely, sometimes A1=ALT1, and sometimes A1=REF. From PLINK's documentation, A1 is the "counted allele" (a.k.a. effect allele). 

The question is as follows: is it correct (?) to assume that, in all cases, 
* When A1=REF, then A2=ALT1
* When A1=ALT1, then A2=REF

If that is not always the case (?), could you please suggest which columns (from --glm hide-covar cols=) would give an unequivocal A1 (effect allele) vs. A2 (non-effect allele) naming? 

Thanks in advance, 

Christopher Chang

unread,
Dec 3, 2019, 2:50:04 PM12/3/19
to plink2-users
That is correct.  You can add "ax" to the column-set list to add an explicit A2 column.  (It's called "AX" to generalize to multiallelic variants.)
Reply all
Reply to author
Forward
0 new messages