Re: pyslim VCF output nonconforming when reading treeseq

45 views
Skip to first unread message

Peter Ralph

unread,
Jun 14, 2021, 1:03:33 PM6/14/21
to slim-discuss
Follow-up here (I replied off-list, whoops): The problem is that
- the ALT allele you're seeing is the SLiM mutation ID of the SLiM
mutation. (more generally, it might be a comma-separated list of IDs,
if the mutations are stacking)
- the REF allele is always "" (the empty string), because anyone with
the ancestral state doesn't have any (SLiM) mutations.

To have nucleotides in those columns we need to edit the tree sequence
to swap out nucleotides for those SLiM IDs. This will be in the next
pyslim release, but in the meantime, here's code you can adapt for
your purpose:
https://github.com/tskit-dev/pyslim/issues/168
and in case anyone wants to have a preview of how it'll work in the
next pyslim (which I happened to be just working on), here it is:
https://github.com/tskit-dev/pyslim/pull/174

thanks,
Peter

On Sun, Jun 13, 2021 at 6:20 PM gasto...@gmail.com
<gasto...@gmail.com> wrote:
>
>
> Hi all,
>
> I am simulating a classic sweep by first simulating a single adaptive variant and overlaying mutations afterwards with pyslim. I output a VCF file, the neutral overlaid mutations look normal, but the "adaptive" mutation does not conform to VCF. The ALT allele is always missing, and the REF allele is some random number like 52 or 28. Here is an example of only the adaptive mutation.
>
> <pre style='color:#000000;background:#ffffff;'>##fileformat<span style='color:#808030; '>=</span>VCFv4<span style='color:#008c00; '>.2</span>
> ##source<span style='color:#808030; '>=</span>tskit <span style='color:#008c00; '>0.3</span><span style='color:#008c00; '>.6</span>
> ##FILTER<span style='color:#808030; '>=</span><span style='color:#808030; '>&lt;</span>ID<span style='color:#808030; '>=</span>PASS<span style='color:#008c00; '>,</span>Description<span style='color:#808030; '>=</span><span style='color:#800000; '>"</span><span style='color:#0000e6; '>All filters passed</span><span style='color:#800000; '>"</span><span style='color:#808030; '>></span>
> ##contig<span style='color:#808030; '>=</span><span style='color:#808030; '>&lt;</span>ID<span style='color:#808030; '>=</span><span style='color:#008c00; '>1,</span>length<span style='color:#808030; '>=</span><span style='color:#008c00; '>9999991</span><span style='color:#808030; '>></span>
> ##FORMAT<span style='color:#808030; '>=</span><span style='color:#808030; '>&lt;</span>ID<span style='color:#808030; '>=</span>GT<span style='color:#008c00; '>,</span>Number<span style='color:#808030; '>=</span><span style='color:#008c00; '>1,</span>Type<span style='color:#808030; '>=</span>String<span style='color:#008c00; '>,</span>Description<span style='color:#808030; '>=</span><span style='color:#800000; '>"</span><span style='color:#0000e6; '>Genotype</span><span style='color:#800000; '>"</span><span style='color:#808030; '>></span>
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT p5<span style='color:#808030; '>:</span>i4857 p4<span style='color:#808030; '>:</span>i3461 p3<span style='color:#808030; '>:</span>i2823 p2<span style='color:#808030; '>:</span>i1176 p1<span style='color:#808030; '>:</span>i968
> <span style='color:#008c00; '>1</span> <span style='color:#008c00; '>4999990</span> <span style='color:#808030; '>.</span> <span style='color:#008c00; '>51</span> <span style='color:#808030; '>.</span> PASS <span style='color:#808030; '>.</span> GT <span style='color:#008c00; '>1</span><span style='color:#808030; '>|</span><span style='color:#008c00; '>1</span> <span style='color:#008c00; '>0</span><span style='color:#808030; '>|</span><span style='color:#008c00; '>0</span> <span style='color:#008c00; '>1</span><span style='color:#808030; '>|</span><span style='color:#008c00; '>1</span> <span style='color:#008c00; '>0</span><span style='color:#808030; '>|</span><span style='color:#008c00; '>0</span> <span style='color:#008c00; '>0</span><span style='color:#808030; '>|</span><span style='color:#008c00; '>0</span>
>
> </pre>
>
>
>
> This is the code I am using for overlaying mutations:
>
>
> <pre style='color:#000000;background:#ffffff;'><span style='color:#800000; font-weight:bold; '>import</span> argparse
> <span style='color:#800000; font-weight:bold; '>import</span> msprime<span style='color:#808030; '>,</span> pyslim<span style='color:#808030; '>,</span> gzip
> <span style='color:#800000; font-weight:bold; '>import</span> numpy <span style='color:#800000; font-weight:bold; '>as</span> np
> parser <span style='color:#808030; '>=</span> argparse<span style='color:#808030; '>.</span>ArgumentParser<span style='color:#808030; '>(</span> description <span style='color:#808030; '>=</span> <span style='color:#0000e6; '>'Overlay neutral mutations into treeseq file and output VCF of last generation.'</span><span style='color:#808030; '>)</span>
> parser<span style='color:#808030; '>.</span>add_argument<span style='color:#808030; '>(</span><span style='color:#0000e6; '>"treeseq"</span><span style='color:#808030; '>,</span> <span style='color:#400000; '>type</span><span style='color:#808030; '>=</span><span style='color:#400000; '>str</span><span style='color:#808030; '>,</span> <span style='color:#400000; '>help</span><span style='color:#808030; '>=</span><span style='color:#0000e6; '>"trees file to overlay and write"</span><span style='color:#808030; '>)</span>
> parser<span style='color:#808030; '>.</span>add_argument<span style='color:#808030; '>(</span><span style='color:#0000e6; '>"output"</span><span style='color:#808030; '>,</span> <span style='color:#400000; '>type</span><span style='color:#808030; '>=</span><span style='color:#400000; '>str</span><span style='color:#808030; '>,</span> <span style='color:#400000; '>help</span><span style='color:#808030; '>=</span><span style='color:#0000e6; '>"output gzipped VCF file"</span><span style='color:#808030; '>)</span>
> parser<span style='color:#808030; '>.</span>add_argument<span style='color:#808030; '>(</span><span style='color:#0000e6; '>"sample_size"</span><span style='color:#808030; '>,</span> <span style='color:#400000; '>type</span><span style='color:#808030; '>=</span><span style='color:#400000; '>int</span><span style='color:#808030; '>,</span> <span style='color:#400000; '>help</span><span style='color:#808030; '>=</span><span style='color:#0000e6; '>"VCF sample size per population"</span><span style='color:#808030; '>)</span>
> args <span style='color:#808030; '>=</span> parser<span style='color:#808030; '>.</span>parse_args<span style='color:#808030; '>(</span><span style='color:#808030; '>)</span>
>
>
> ts <span style='color:#808030; '>=</span> pyslim<span style='color:#808030; '>.</span>load<span style='color:#808030; '>(</span>args<span style='color:#808030; '>.</span>treeseq<span style='color:#808030; '>)</span>
> ts <span style='color:#808030; '>=</span> msprime<span style='color:#808030; '>.</span>mutate<span style='color:#808030; '>(</span>ts<span style='color:#808030; '>,</span> rate<span style='color:#808030; '>=</span><span style='color:#008c00; '>1</span><span style='color:#ffffff; background:#dd0000; font-weight:bold; font-style:italic; '>e</span><span style='color:#44aadd; '>-</span><span style='color:#008c00; '>7</span><span style='color:#808030; '>,</span> random_seed<span style='color:#808030; '>=</span><span style='color:#008c00; '>1</span><span style='color:#808030; '>,</span> keep<span style='color:#808030; '>=</span><span style='color:#074726; '>True</span><span style='color:#808030; '>)</span>
> ts<span style='color:#808030; '>.</span>dump<span style='color:#808030; '>(</span><span style='color:#0000e6; '>"overlaid.trees"</span><span style='color:#808030; '>)</span>
> ts <span style='color:#808030; '>=</span> pyslim<span style='color:#808030; '>.</span>load<span style='color:#808030; '>(</span><span style='color:#0000e6; '>"overlaid.trees"</span><span style='color:#808030; '>)</span>
> pops <span style='color:#808030; '>=</span> np<span style='color:#808030; '>.</span>unique<span style='color:#808030; '>(</span>ts<span style='color:#808030; '>.</span>individual_populations<span style='color:#808030; '>)</span>
> individuals <span style='color:#808030; '>=</span> np<span style='color:#808030; '>.</span>array<span style='color:#808030; '>(</span><span style='color:#008c00; '>0</span><span style='color:#808030; '>)</span>
> populations <span style='color:#808030; '>=</span> np<span style='color:#808030; '>.</span>array<span style='color:#808030; '>(</span><span style='color:#008c00; '>0</span><span style='color:#808030; '>)</span>
> <span style='color:#800000; font-weight:bold; '>for</span> i <span style='color:#800000; font-weight:bold; '>in</span> pops<span style='color:#808030; '>:</span>
> inds_sample <span style='color:#808030; '>=</span> np<span style='color:#808030; '>.</span>random<span style='color:#808030; '>.</span>choice<span style='color:#808030; '>(</span>ts<span style='color:#808030; '>.</span>individuals_alive_at<span style='color:#808030; '>(</span><span style='color:#008c00; '>0</span><span style='color:#808030; '>,</span>population<span style='color:#808030; '>=</span>i<span style='color:#808030; '>)</span><span style='color:#808030; '>,</span>args<span style='color:#808030; '>.</span>sample_size<span style='color:#808030; '>,</span>replace<span style='color:#808030; '>=</span><span style='color:#074726; '>False</span><span style='color:#808030; '>)</span>
> pop_ids <span style='color:#808030; '>=</span> np<span style='color:#808030; '>.</span>full<span style='color:#808030; '>(</span><span style='color:#400000; '>len</span><span style='color:#808030; '>(</span>inds_sample<span style='color:#808030; '>)</span><span style='color:#808030; '>,</span>i<span style='color:#808030; '>)</span>
> <span style='color:#696969; '># append</span>
> populations <span style='color:#808030; '>=</span> np<span style='color:#808030; '>.</span>insert<span style='color:#808030; '>(</span>populations<span style='color:#808030; '>,</span><span style='color:#008c00; '>1</span><span style='color:#808030; '>,</span>pop_ids<span style='color:#808030; '>)</span>
> individuals <span style='color:#808030; '>=</span> np<span style='color:#808030; '>.</span>insert<span style='color:#808030; '>(</span>individuals<span style='color:#808030; '>,</span><span style='color:#008c00; '>1</span><span style='color:#808030; '>,</span>inds_sample<span style='color:#808030; '>)</span>
>
> individuals <span style='color:#808030; '>=</span> np<span style='color:#808030; '>.</span>delete<span style='color:#808030; '>(</span>individuals<span style='color:#808030; '>,</span><span style='color:#008c00; '>0</span><span style='color:#808030; '>)</span>
> populations <span style='color:#808030; '>=</span> np<span style='color:#808030; '>.</span>delete<span style='color:#808030; '>(</span>populations<span style='color:#808030; '>,</span><span style='color:#008c00; '>0</span><span style='color:#808030; '>)</span>
>
> ind_names <span style='color:#808030; '>=</span> <span style='color:#808030; '>[</span><span style='color:#808030; '>]</span>
>
> <span style='color:#800000; font-weight:bold; '>for</span> ind<span style='color:#808030; '>,</span>pop <span style='color:#800000; font-weight:bold; '>in</span> <span style='color:#400000; '>zip</span><span style='color:#808030; '>(</span>individuals<span style='color:#808030; '>,</span>populations<span style='color:#808030; '>)</span><span style='color:#808030; '>:</span>
> name <span style='color:#808030; '>=</span> <span style='color:#0000e6; '>"p"</span> <span style='color:#44aadd; '>+</span> <span style='color:#400000; '>str</span><span style='color:#808030; '>(</span>pop<span style='color:#808030; '>)</span> <span style='color:#44aadd; '>+</span> <span style='color:#0000e6; '>":"</span> <span style='color:#44aadd; '>+</span> <span style='color:#0000e6; '>"i"</span> <span style='color:#44aadd; '>+</span> <span style='color:#400000; '>str</span><span style='color:#808030; '>(</span>ind<span style='color:#808030; '>)</span>
> ind_names<span style='color:#808030; '>.</span>extend<span style='color:#808030; '>(</span><span style='color:#808030; '>[</span>name<span style='color:#808030; '>]</span><span style='color:#808030; '>)</span>
>
>
> <span style='color:#800000; font-weight:bold; '>with</span> gzip<span style='color:#808030; '>.</span><span style='color:#400000; '>open</span><span style='color:#808030; '>(</span>args<span style='color:#808030; '>.</span>output<span style='color:#808030; '>,</span> <span style='color:#0000e6; '>"wt"</span><span style='color:#808030; '>)</span> <span style='color:#800000; font-weight:bold; '>as</span> f<span style='color:#808030; '>:</span>
> ts<span style='color:#808030; '>.</span>write_vcf<span style='color:#808030; '>(</span>f<span style='color:#808030; '>,</span>individuals<span style='color:#808030; '>=</span>individuals<span style='color:#808030; '>,</span>individual_names<span style='color:#808030; '>=</span>ind_names<span style='color:#808030; '>)</span>
> </pre>
>
> Any idea of what I might be missing?
>
> Thanks,
>
> gaston
>
>
>
> --
> SLiM forward genetic simulation: http://messerlab.org/slim/
> ---
> You received this message because you are subscribed to the Google Groups "slim-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/4ce59a75-02f8-45c9-9f77-9d9725028051n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages