confusion regarding minor/major/ref/alt alleles in relation to case-control analysis

gab...@research.haifa.ac.il

unread,

Jan 31, 2019, 11:05:59 AM1/31/19

to plink2-users

Hi guys,

I know many people have asked about this subject before, but I didn’t find something that gives me the answer I need.

I’m doing Case-Control analysis using PLINK 1.9. During the pre-analysis QC (which included “--maf 0.001” filter), I understood that PLINK 1.9 flips the A1/A2 order based on “minor”=A1 (and A1=ALT allele, if I understand correctly).

In about 5% of my data, this is not the case, meaning the ALT allele has the bigger frequency.

I ran the pre-analysis QC (reminder: it included “--maf 0.001” filter) and the Case-Control analysis (using --assoc) twice: First I let PLINK work as usual, with the A1/A2 switching. In the second time I forced A2 as the “reference allele” (not ALT) by using “--a2-allele” flag (I checked, it worked).

Results:

The results of the QC (Number and identity of sample and SNPS) were exactly the same in the 2 analyses.

The results of the Case-Control analyses were also exactly the same.

How can it be? Am I wrong to understand that A1=ALT and A2=REF (and PLINK treats minor allele as the ALT allele)? Is there any effect of the A1/A2 switching on the –assoc function?

Thanks, Gabriel

Christopher Chang

unread,

Jan 31, 2019, 12:28:05 PM1/31/19

to plink2-users

PLINK 1.x tries to set A1 = minor on every single run. If you use --a2-allele on the QC run, but then leave it out during the --assoc run, the A1 alleles will be forced to minor in the --assoc run.

gab...@research.haifa.ac.il

unread,

Jan 31, 2019, 12:39:59 PM1/31/19

to plink2-users

Hi Christopher, I realized (after several tests) that I need to introduce the --a2-allele every command. my assoc script was: "./plink --bfile Final_bfile_31Jan2019 --a2-allele REF_List_31Jan2019.csv 2 1 --assoc --ci 0.95 --out Final_bfile_31Jan2019_as".

John Jackson

unread,

Dec 3, 2019, 2:39:21 PM12/3/19

to plink2-users

Hi Christopher,

This is probably obvious, but could you please confirm the following point about A1/ALT/REF alleles?

In short, this is the PLINK-2 command I ran:

plink2 \
  --covar covarfile.txt \
  --covar-name sex,age,PC1,PC2 \
  --glm hide-covar cols=chrom,pos,alt1,ref,a1freq,beta,se,p \
  --mach-r2-filter 0.3 1.0 \
  --maf 0.01 \
  --memory 64000 \
  --out outfile \
  --pfile genotypefileset \
  --pheno phenofile.txt \
  --pheno-name phenotype

And this is a small chunk of the output:

#CHROM  POS     ID      REF     ALT1    A1      A1_FREQ BETA    SE      P
1       1  rs1     T       C       T       0.106321        0.0125737       0.0219298       0.56641
1       2  rs2     T       C       T       0.0177909       -0.00393126     0.0592781       0.947125
1       3  rs3     C       T       T       0.230796        -0.0195085      0.0212053       0.357598
1       4  rs4     C       T       T       0.201837        -0.0193924      0.0221503       0.381323

Namely, sometimes A1=ALT1, and sometimes A1=REF. From PLINK's documentation, A1 is the "counted allele" (a.k.a. effect allele).

The question is as follows: is it correct (?) to assume that, in all cases,

* When A1=REF, then A2=ALT1

* When A1=ALT1, then A2=REF

If that is not always the case (?), could you please suggest which columns (from --glm hide-covar cols=) would give an unequivocal A1 (effect allele) vs. A2 (non-effect allele) naming?

Thanks in advance,

Christopher Chang

unread,

Dec 3, 2019, 2:50:04 PM12/3/19

to plink2-users

That is correct. You can add "ax" to the column-set list to add an explicit A2 column. (It's called "AX" to generalize to multiallelic variants.)

Reply all

Reply to author

Forward