Subject: aborted runs and problems with multiSNP utility
From: catherine <cli...@gmail.com>
To: BIMBAM HELP <bimba...@googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1
The plot.multiSNP code is not well designed to work with
so many SNPs. I hope we can get something more robust available
soon, but we don't have anything for now I'm afraid (you are the
first to report a problem! If others report problems this will become
a higher priority!)
Best wishes,
Matthew
> --
> You received this message because you are subscribed to the Google Groups "BIMBAM HELP" group.
> To post to this group, send email to bimba...@googlegroups.com.
> To unsubscribe from this group, send email to bimbam_help...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/bimbam_help?hl=en.
>
>
Hi Matthew and Grant,
Thanks so much for the advice. The BFs for L=2 and L=3 are essentially
the same. But I was hoping to see whether I could distinguish between
models in which there are a small # of causal variants and models in
which there are many causal variants (most of small effect). It sounds
like piMASS is the way to go for this, and following your suggestion,
I read your paper and have been playing around with the program. I
have a few questions regarding this, if you get the chance.
(1) What are the defaults for the hyperparameters -h and -p and do you
recommend trying alternative settings for -hmax/min -pmax/min? If so,
and given that I am working with a densely genotyped candidate region
(800 snps over 180kb) rather than a set of genome-wide markers, which
settings would you recommend? [some additional background: this
"candidate region" explains roughly 80-90% of phenotypic variation
among laboratory strains; I am now utilizing a variable natural
population to identify causal variants within this region].
(2) I'm still getting to know the program and the output, but do you
have any recommendations for particularly effective ways to summarize
the results? So far I have been looking at Manhattan plots for single-
SNP BFs and PIPs (from "snp" and "mcmc" files, respectively). To
summarize evidence for total # of SNPs, I plotted BF as a function of
SNP # (from the sampled states in the "path" file) and saw a peak
around 25 snps [log(BF)=46]. In contrast, sum of PIPs for all SNPs
("E" in your paper, I think) in this region was ~76. Are there better
ways to summarize the evidence for snp #? Also, which data did you use
to generate Fig 6B (posterior distributions of PVE)?
I would definitely say for the candidate region you need to consider
changing the defaults,
since the defaults were set up for genome-wide analysis.
*In general* for a candidate gene, you would expect the total h to be
much smaller
than one (eg in human complex diseases, most genes will have genetic
variants explaining no more than 5%
of the variance at the very most, and in fact many will probably
explain less than 1%).
And most genes probably have only a few causal variants (so pmax = 200
would seem inappropriate).
For an arbitrary candidate gene in a human complex disease I would
have said try hmax = 0.05 and pmax = 10
to begin with. But it sounds from your description that you might be
in a somewhat
non-standard situation? Without knowing more it is difficult to make
suggestions.
[If you want to discuss such details further off-line, via direct
email, rather than bimbam_help then feel free]
Matthew