Re: [stacks] Re: populations won't read vcf file, file definitely converted to unix format, seems to be problem with underscore in locus id

215 views
Skip to first unread message
Message has been deleted
Message has been deleted

Nicolas Rochette

unread,
Apr 17, 2017, 11:51:19 AM4/17/17
to stacks...@googlegroups.com
Hi Ella,

The --whitelist option is not intended to work with -V input (VCF input).

Also, you should not export the data to VCF to later re-import it into the Stacks pipeline. Why don't you just run populations on the output of the Stacks core pipeline using --min_maf ?

Best,
Nicolas

Ella wrote on 04/15/2017 07:52 PM:
ps the top of my whitelist looks like

13_65
51_9
126_17
257_35
582_31

etc......

On Saturday, April 15, 2017 at 8:50:35 PM UTC-4, Ella wrote:
Hello,

I am trying to run populations (stacks v1.45) using a whitelist, after doing denovo alignment, but am getting an error that populations is unable to parse the whitelist. I think the issue is that the locus ids in my whitelist have an underscore. I'm using the ID column of a vcf file (I filtered my original vcf file for -maf using vcf tools, outstide of stacks, and took the id column from the recoded vcf to make the whitelist). The relevant top section of my vcf file looks like, with a couple of the locus ids coloured in red.

#CHROM POS ID REF ALT QUAL
un 1027 13_65 C T 0
un 4011 51_9 A C 0
un 10019 126_17 A T 0
un 20517 257_35 G A 0
un 46513 582_31 G C 0

When I have done this before, I have had reference-aligned data, and my IDs were just the locus (ie, no underscore). Is there a reason that my locus IDs look like they do here/is it because I did denovo alignment? And, can I just remove the numbers after the underscore, and use the numbers before the underscore. The locus IDs in the SNP files, etc... don't have any underscores.

The actually error that I get is: Unable to parse whitelist, '/gs/project/njm-111-aa/stacks_workflowWal/01-info_files/r12bWhitelist.txt' at line 1

My whitelist file is definitely converted to unix
file r12bWhitelist.txt
r12bWhitelist.txt: ASCII text


--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
Visit this group at https://groups.google.com/group/stacks-users.
For more options, visit https://groups.google.com/d/optout.
Message has been deleted

Nicolas Rochette

unread,
Apr 17, 2017, 5:00:28 PM4/17/17
to stacks...@googlegroups.com
Hi Ella,

What I meant is that the -V and -W options are not made to work
together. I would recommend just using --min_maf. In which sense are
your results using this method 'weird' ?

Best,
Nicolas

Ella wrote on 04/17/2017 03:19 PM:
> Hi Nicolas,
>
> Thank you for the quick reply. I think I might have been a little
> unlcear in my post, I'm not running with a vcf file as the whitelist.
> I've just taken the ID column from the VCF file (so, locus ID), to
> make the whitelist. The first few lines of my whitelist look like
>
> 13_65
> 51_9
> 126_17
> 257_35
> 582_31
> etc...
>
> I ran populations using the built in -maf, but got strange results.
> Since I have had problems with this filter in older versions of
> stacks, and have had success filtering for -maf using vcf tools (and
> then creating a whitelist using locus ID from the recoded vcf file,
> which I have then used to re-run populations), I want to compare the
> results that I get using both the built in and external approaches.
>
> Do these locus ids look normal (i.e., is there always an underscore in
> them now), or is this due to having denovo-aligned data? My vcf files
> generated in older versions of stacks, using reference-aligned data,
> do not have underscores, and thus match exactly with locus ids that
> are in the other stacks output files (i.e., matches, alleles, etc...).
>
> With thanks,
> Ella

Nicolas Rochette

unread,
Apr 17, 2017, 5:50:40 PM4/17/17
to stacks...@googlegroups.com
I am still confused by what you are trying to do.

If you are just trying to derive a whitelist from the VCF to then re-run
populations on the cstacks/sstacks output again, this time using a
whitelist, this is possible. But VCF and MAF filters are usually
SNP-centered (and my guess is that this is what you want), not
locus-centered. Thus a whitelist of loci can't logically work, and you
want to work with a whitelist of SNPs. See the manual:
http://catchenlab.life.illinois.edu/stacks/manual/#wl

The IDs that you see in the VCF are simply '<locus ID>_<SNP column>', so
if you replace the underscores with tabs you will obtain a SNPs whitelist.

Best,
Nicolas
Message has been deleted
Message has been deleted

Ella

unread,
Apr 17, 2017, 7:47:29 PM4/17/17
to Stacks, roch...@illinois.edu
HI Nicolas,

Thanks for your last answer. This is what I was looking for. 

I deleted my messages in the hopes of being able to post our dialog again, editing the title to something that wouldn't be so misleading. But, turns out that I can't delete the entire thread, or edit the subject of my first question. Please feel free to do this, or delete the rest of the dialog. I can re-post with more helpful title.

With thanks, again,
Ella
Reply all
Reply to author
Forward
0 new messages