how is multi-allelic loci filtered in plink and plink2?

816 views
Skip to first unread message

Henry Lu

unread,
Oct 5, 2022, 11:05:05 PM10/5/22
to plink2-users
Hi

I have a vcf file as an input and want to convert it to plink format (bim, bed, fam) using the  `max-alleles 2` argument. For example, at position 10001, the A has 70% freq, T has 20% and G has 10%. I have some questions about how this `max-allele 2` handles this situation computationally. 

1. Does plink remove this position 10001 where multi-allelic? Or does it treat the 10% as missing and keep the 70% and 20%? 

2. Is the minor allele frequency (MAF) in plink2 defined as the second most common allele in that position? (If I want to apply the --maf filter)

3. How is indel handled? Would it be coded as an "allele" in the bim file?

I am a relatively new user, so I am not sure whether currently the community still uses bim, bed and fam files. Should I switch to psam files or other types to follow the forefront?

Many thanks
Henry

Christopher Chang

unread,
Oct 7, 2022, 11:54:24 AM10/7/22
to plink2-users
1. With "plink2 --max-alleles 2", the variant you describe is removed.

2. plink2 --maf's default behavior is to add up all the frequencies except the largest one.  However, this is configurable; see the second paragraph of the --maf documentation.

3. Indels are handled in the same way they are in VCF files.

4. .bed+.bim+.fam will probably always be better-supported by other software, because it's a much simpler format.  It's fine to keep using it until you run into one of its major limitations, which include:
- poor tracking of REF vs. ALT alleles
- no support for phase or dosage information
- no direct support for multiallelic variants, you have to "split" them first, and this will distort some analytical results if you aren't careful (safer to just filter these variants out)
- inefficiency (relative to .pgen), especially when you have many samples and many rare variants.

Because .bed+.bim+.fam is better-supported by other software, you should expect to use --make-bed once in a while even if your workflow is based on the .pgen format.
Reply all
Reply to author
Forward
0 new messages