Hi Chloe,
If used, the effective gene size for an individual should reflect the number of possible genomic sites (and if base context is used, multiplied by 3 for all 3 possible non-reference point mutations) at which coverage was sufficient for a de novo mutation to be detected.
For example, what we used in our study is to count the number of sites in each gene for which the mother, father, and child had a coverage of at least 10x (i.e., 10 reads), since when we used PlinkSeq to call de novo mutations that was what we required (thus no de novos would be detected if there’s less than 10 reads at any of the 3 individuals’ data).
See here for more info on how to run PlinkSeq’s de novo mutation detection command:
Though, the parameter names have changed slightly since that post, and you can always find the latest parameters and explanations by running the following command (I’ve also added the output here for convenience):
pseq help denovo
denovo : filter for de-novo mutations and transmitted variants (SNPs and indels)
---------------------------------------------------------
--allowDoubleAltDeNovos { flag }
Include de novos that consist of two alternate alleles in child
--minChildDP { int }
Minimum child depth
--minChildPL { float }
Minimum child PL (genotype likelihood) for non-called genotype
--minHet_AB_alt { float }
Minimum AB-ALT (% of reads with ALT allele) for heterozygous individual
--minHet_AB_ref { float }
Minimum AB-REF (% of reads with REF allele) for heterozygous individual
--minHomAlt_AB_alt { float }
Minimum AB-ALT (% of reads with ALT allele) for homozygous alternate individual
--minHomRef_AB_ref { float }
Minimum AB-REF (% of reads with REF allele) for homozygous reference individual
--minMQ { float }
Minimum MQ (read mapping quality)
--minParDP { int }
Minimum parental depth
--minParPL { float }
Minimum parental PL (genotype likelihood) for non-called genotype
--printTransmission { flag }
Parent variants will be printed with transmission status
Specifically, in our work, we used:
--minChildDP 10
--minParDP 10
Best,
Menachem
Hi Dr. Fromer,
You mentioned in earlier correspondence that it is possible to include individual-specific coverage information in the gene matrix file. You mention in the documentation:
The per-trio gene sizes should be used if one has calculated the effective gene sizes after requiring that there be sufficient sequencing coverage in all 3 members of the trio at the corresponding bases in the respective genes.
How are the per-trio effective gene sizes calculated? Is there a way to adapt it for multiplex families, and/or incorporate per-individual effective gene size rather than per-trio effective gene size? And how do those get incorporated into the gene size matrix? Any more information on this would be much appreciated - I've read the documentation but have been unable to figure out exactly how we can incorporate this information into our runs of dnenrich.
Best,
Chloe O'Connell