How to perform burden analysis with PLINK?

262 views
Skip to first unread message

Caterina

unread,
Oct 20, 2023, 11:42:09 AM10/20/23
to plink2-users
I saw this for plink1.7 but not sure if it exists for 1.9 or 2.

Basically I want to obtain number of mutations (SNP=alt) within a gene for each person.
How could I do that?

The output could be either 
indID number_of_mut
ind1 1
ind2 2
ind3 0
ind4 1

Or the sum of them
individuals with 1 mutation in geneA = 2
individuals with 2 mutations in geneA = 1
individuals with 0 mutations in geneA = 1

Thank you!

--freqx provides this at the variant level I would like it at the "gene" level. 

Master thesis

unread,
Oct 20, 2023, 11:50:49 AM10/20/23
to plink2-users
I forgot to mention that I also need the phasing info, and whether it is homozygous or heterozygous. 
Thank you!

--
You received this message because you are subscribed to a topic in the Google Groups "plink2-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/plink2-users/VQqlk3fJWLE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/d22b073e-6993-4e09-ba57-d2700d918455n%40googlegroups.com.

Christopher Chang

unread,
Oct 21, 2023, 4:53:16 PM10/21/23
to plink2-users
1. You need to know which allele to count for each SNP.  If you do, you can provide this information to plink 2.0 --score, in a file that looks something like

snp1  A  1
snp2  C  1
snp3  T  1
snp4  A  1
...

Then "--score <name of the file described above> no-mean-imputation cols=+scoresums" should provide the totals you want in the last column of the output file.

2. If you have phased data and want a separate count for each side, there is currently no plink command to compute that directly.  (One workaround is to create a file with twice as many samples, where each new sample contains homozygous genotypes for one of the original sample-sides.  I'm not aware of a preexisting program that does this, but it's pretty straightforward to write a script that performs this transformation to a VCF file.)

Caterina

unread,
Oct 22, 2023, 7:07:22 AM10/22/23
to plink2-users
Thank you! Is it possible to do this for two genes in combination?
indID mut_in_gene_A mut_in_gene_B
ind1 1 2
ind2 2 3
ind3 0 1
ind4 1 0

Master thesis

unread,
Oct 22, 2023, 8:57:27 AM10/22/23
to plink2-users
Or should I just run --score twice for each gene?

Caterina

unread,
Oct 23, 2023, 1:11:09 PM10/23/23
to plink2-users
Could you please explain me the difference between dosagesum and scoresums?
Wouldn't they be the same in my case?

Christopher Chang

unread,
Oct 23, 2023, 1:16:17 PM10/23/23
to plink2-users
1. --score-list provides one way to compute scores for multiple genes at once.
2. With --score on a single gene, yes, dosagesum and scoresums would have the same values because every multiplier is 1.  (If any third-column value was not 1, they would no longer be guaranteed to have the same value.)  But with --score-list on multiple genes, there is no dosagesum column.

Reply all
Reply to author
Forward
0 new messages