Help understanding denovo_map.pl outputs

462 views
Skip to first unread message

Giulia Trauzzi

unread,
Apr 6, 2021, 7:31:27 PM4/6/21
to stacks...@googlegroups.com
Hello,

I am running denovo_map.pl on my subset of samples following Rochette&Catchen to optimize the main Stacks parameters.
I pulled all the samples together in 1 population and then I am having a look at the outputs.

As I am starting with genomics by myself I have a lot of doubts:

1. Is the number of SNPs (final filtered after populations) written in the populations.log file as "variant sites" ?

2. Is the number of final loci in the same populations.log file as "loci kept" ?

3.When building the line chart during the parameters optimization to show how the curve is flattening at a certain value of M what is the value plotted on the y axis, what is the "No. of loci shared by 80% of samples"? Where do I get this value?
My line chart looks very weird and whatever value of "loci" I plot against the value of M shows a decrease of the curve, not a flattening of the curve as shown in Rochette&Catchen (2017).

About the point 3. Could this be because in my populations.log.distribs file I get the following...

BEGIN samples_per_loc_prefilters
# Distribution of valid samples matched to a catalog locus prior to filtering.
n_samples       n_loci
0       74152
END samples_per_loc_prefilters



and

BEGIN samples_per_loc_postfilters
# Distribution of valid samples matched to a catalog locus after filtering.
n_samples       n_loci
0       22506
END samples_per_loc_postfilters

Also, why is this ? I have a lot of missing samples.

Thanks to whoever will be able to help me

Cheers,

Giulia

B G

unread,
May 25, 2021, 11:37:14 AM5/25/21
to Stacks
Hey! 

I am very new to running stacks, and am trying to optimize my parameters/understand the outputs. 

Did you ever get any answer to this? I am trying to use the r80 method from Paris et al. 2017, but I am not sure I am even doing that right. 

Thanks, 

Bergen

Giulia Trauzzi

unread,
May 25, 2021, 2:36:14 PM5/25/21
to stacks...@googlegroups.com
Hi Bergen, 

I never got a straight answer, but I managed to understand, unfortunately mI wasn't able to set the paraneters following their protocol as my reads are too short and cannot run the script with M = 7, M=8, M=9.But I feel like I have a better understanding now... what would you like to know? 

Best, 

Giulia

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stacks-users/13cfce57-3816-4f85-8783-689fa423996an%40googlegroups.com.

B G

unread,
May 25, 2021, 4:10:05 PM5/25/21
to stacks...@googlegroups.com
Hey! 

Thanks for the quick response. I am really struggle to understand the different outputs and where to find the values that allow you to optimize your parameters for r80. 

This is the general code that I have been running for the different parameter sets, but I get a variety of output files and I have no idea where to find the number of assembled loci, the number of polymorphic loci found in 80% of the population, or the number of SNPs.

denovo_map.pl -T 8 -M 2 -n 1 -o /home/bdiz/project/06_parameter/03_M2_n1_m3/03.2_stacks_out --popmap /home/bergeng/project/03_stacks_out/popmap/text_popmap.txt --samples /home/bdiz/project/02_processradtags/sample --min-samples-per-pop 0.80 --paired


I really appreciate any help you can provide. I am completely new at this, and have no idea what I am looking at in the output files, so even if you know any good papers or sources I would be so thankful!

Kind Regards, 

Bergen


Giulia Trauzzi

unread,
May 25, 2021, 8:51:06 PM5/25/21
to stacks...@googlegroups.com
Hi Bergen,

I will tell you the information I was able to collect while asking everyone how to interpret and use the output from Stacks, however I have not read this information on any published paper (more like word of mouth type of information).

So first of all, I guess you are following the protocol of Catchen and Paris and trying to recreated the line chart and the bar plot that they show in the paper. Usually, when you perform the 9 runs with the different values of M (1-9) you usually should supply a population map that shows that all the individuals are in the same hypothetical population (although this is not true in reality). So like all listed as "pop1".

2. You will find the no. all loci assembled, loci kept and snps in the populations.log file. In this file, the "snps" are not called "snps", they are indicated as "variant sites".

3. If you are trying to make the line chart in that paper where they show the flattening of the curve at increasing value of M, the values on the Y axis are the "variant sites" or "snps" (it was very confusing for me to understand that because they use different terms to indicate the SNPs). These variant sites ARE the ones shared by 80% of the individuals as you instructed the module "populations" with the argument -r 0.8.

4. When you will have to create the bar plot showing the SNPs distribution, you will find this info in the populations output file called populations.log.distrib (at the end of the file).

This is what I came up with. If you have any specific questions, please let me know I will try to help more. Please, remember that I only got this information by emailing people around the world trying to understand what the terms meant!
I really hope this will help clarify some doubts, but please get in touch if you have some more specific questions.

Best,

Giulia

B G

unread,
May 26, 2021, 10:47:45 AM5/26/21
to stacks...@googlegroups.com
Thank you so much for all of this! 

I really appreciate it! 

Kind regards, 

Bergen 

B G

unread,
Jun 3, 2021, 3:49:47 PM6/3/21
to stacks...@googlegroups.com
Hey Guilia, 

Sorry to bother you again. I was wondering how you formatted your popmap in order to treat all the samples as a single population for parameter optimization? I am dealing with 7 populations, but when I created a new popmap with them all being in the same population the code won't run. 

Also, I totally understand if you are not comfortable sharing, but if you don't mind, I would love to see a sample of the code you ran for your parameter optimization. 

Kind Regards, 

Bergen

Giulia Trauzzi

unread,
Jun 3, 2021, 4:28:32 PM6/3/21
to stacks...@googlegroups.com
Hi Bergen,

as I could not follow the protocol published by Rochette & Catchen, I did it in a different way. However, what error message do you get when your code does not run? 

The popmap is a .txt file and it is tab separated text file: one column is the name.of the samples<tab>pop. 

I will send u a screenshot of mine inve I will get to uni :) 


Cheers

G

B G

unread,
Jun 3, 2021, 4:39:42 PM6/3/21
to stacks...@googlegroups.com
Thanks for the quick response! 

I am just trying to get denovo_map.pl to run with a popmap that has all individuals under the same population code. 

So my popmap is a txt file the follows the format 

sample_id pop_one
sample_id pop_one
sample_id pop_one
sample_id pop_one
etc. 

But I keep getting the error code "could not parse populations" 

Thanks again for being willing to help. Which university do you work out of?

Regards, 

Bergen

Giulia Trauzzi

unread,
Jun 3, 2021, 5:50:50 PM6/3/21
to stacks...@googlegroups.com
Hi Bergen,

I am a PhD student at Victoria University of Wellington, in New Zealand.

I have attached a screenshot of the popmap and this is the line of code I use to run the denovo_map.pl

denovo_map.pl --samples ../180samples/ --popmap ../subset_map.txt -o ./m6p9_out/ -M 3 -n 4 -m 6 -X "populations: -r 0.8 -p 9" -T 20  ##This is an example with some test parameters :)

I think I used to get your error message too when I was not putting the extension to the population map file (.txt)

Also, from what I understood, there is not real fixed rule to set these parameters.. I have tried to find a way myself that would fit my short reads.. Hopefully, it is a good way to approach this...

If you have any more questions, please do not hesitate to ask. I do not know much (as I am only starting) but I am happy to help :)

Where do you work?

Cheers,

Giulia



Example_pop_map_tab.PNG

Giulia Trauzzi

unread,
Jun 3, 2021, 5:52:00 PM6/3/21
to stacks...@googlegroups.com
Forgot to say... I did SE sequencing! So the popmap file changes if you have PE reads! :)

G

B G

unread,
Jun 7, 2021, 10:38:42 AM6/7/21
to stacks...@googlegroups.com
Thanks again for all your help. I am working with paired end reads, and unfortunately everytime I try to run denovo_map.pl the populations function fails at the last command (regardless of what the last command is), so I am hoping to find some sample code to see where I am going wrong. 

I am actually currently just an undergraduate student at the University of Calgary (Canada).

Julian Catchen

unread,
Jun 8, 2021, 5:36:41 PM6/8/21
to stacks...@googlegroups.com, Giulia Trauzzi, bergen...@gmail.com
Hi All,

The population map does not change with paired-end reads and it does not
need to have an extension. It simply needs to be a tab-separated file.
See the manual for examples:

http://catchenlab.life.illinois.edu/stacks/manual/#popmap

Giulia Trauzzi wrote on 6/3/21 4:51 PM:
> Forgot to say... I did SE sequencing! So the popmap file changes if you
> have PE reads! :)
>
> G
>
> On Fri, 4 Jun 2021 at 09:50, Giulia Trauzzi <giulia....@gmail.com
> <mailto:giulia....@gmail.com>> wrote:
>
> Hi Bergen,
>
> I am a PhD student at Victoria University of Wellington, in New
> Zealand.
>
> I have attached a screenshot of the popmap and this is the line of
> code I use to run the denovo_map.pl
> <https://urldefense.com/v3/__http://denovo_map.pl__;%21%21DZ3fjg%21qHIrhwnR4qksFdVPA9dA5bzatS3ROubC55tYrOI3vxTsKcnDYifzvmaGBdCaF_OoQik$>
>
>
> denovo_map.pl
> <https://urldefense.com/v3/__http://denovo_map.pl__;%21%21DZ3fjg%21qHIrhwnR4qksFdVPA9dA5bzatS3ROubC55tYrOI3vxTsKcnDYifzvmaGBdCaF_OoQik$>

Giulia Trauzzi

unread,
Jun 8, 2021, 5:45:25 PM6/8/21
to Julian Catchen, stacks...@googlegroups.com, bergen...@gmail.com
Thanks Julian,

I haven't worked on PE reads, and I thought that the file would change :)

G
Reply all
Reply to author
Forward
0 new messages