number of polymorphic loci from denovo_map.pl output

686 views
Skip to first unread message

daubert...@gmail.com

unread,
Oct 5, 2018, 11:17:59 AM10/5/18
to Stacks
Hi all,
I am trying to assemble RAD loci de novo using the denovo_map pipeline. So far, I have run the pipeline several times with different parameters (m 3 and M=n varied from 1 to 9) and I am now trying to find a suitable parameter configuration for my dataset.
I believe I have obtained the total number of loci in the catalog from the denovo_map.log file. In publications I have seen people compare this number to the number of polymorphic loci. I hope this is not too basic a question, but I am confused as to how I extract this number from the output provided by denovo_map.pl.
Could someone please help me?
Thank you!
Mareike

CaffeSospeso

unread,
Oct 5, 2018, 11:47:09 AM10/5/18
to Stacks
Hi Mareike,

I'm also working on the same type of plots. However, I'm running the pipeline by hand. Anyway, my understanding is that after denovo_map.pl, you need to run "population".

Then from the population.log.distributions output you can extract this information. You can find some clues on how to extract these information from the population.log.distributions file in the scripts provided with the Nature Protocol Rochette and Catchen 2017.  

Keep in mind that they need to be modified, because they were built for the previous version of stack.

If something is unclear, you can ask me to clarify.

Bests,

Gabriele

Nicolas Rochette

unread,
Oct 5, 2018, 3:31:08 PM10/5/18
to Stacks

Hi Mareike, Gabriele,

Try this:

stacks-dist-extract populations.log.distribs snps_per_loc_postfilters

Best,

Nicolas

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stacks-users/96165423-d46f-49ac-9f15-c20b03546e8e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

daubert...@gmail.com

unread,
Oct 6, 2018, 11:54:40 AM10/6/18
to Stacks


Hello Gabriele, Hello Nicolas,
first of all thanks for answering so quickly. I have run populations on my denovo_map output and  I get the batch_1.populations.log file mentioned in Rochette & Catchen. However, I am still a bit confused what the values reported there mean and which is the one I need. What I see is a lot of lists (distribution of valid loci matched to catalog locus, distribution of confounded loci at catalog locus, distribution of missing loci at catalog loci). Each list appears twice, I guess once for values before filtering and once for after?
Then there is a block telling me the number of loci that have been discarded because there were more than two alleles.
Could you give me a hint which part of the file to look at?
Nicolas, I am not really sure what you want me to do. Since I am not very experienced with programming or handling NGS data could you maybe elaborate?
Best,
Mareike

Nicolas Rochette

unread,
Oct 6, 2018, 12:02:41 PM10/6/18
to Stacks

Hi Mareike,

If your log file is called batch_1.populations.log, you are probably using Stacks v1.48 or earlier, and I would recommend upgrading to  the most recent version (v2.2).

Best,

Nicolas

CaffeSospeso

unread,
Oct 6, 2018, 12:27:45 PM10/6/18
to Stacks
Hi Mareike,

As Nicholas said in the previous answer, you are interested on the lines between 'BEGIN snps_per_loc_postfilters" and 'END snps_per_loc_postfilters".

From this distribution you can obtain the i) total number of assembled loci, simply by summing all values of the second column; ii) the number of polymorphic loci, by summing all all values of the second column, except the first one (which tells you how many loci are monomorphic); iii) total number of SNPs, by multiplying the second column with the first column.

I apply this on the poulation.log.distribs output that I obtained with Stack v2.2, as Nicholas was saying. 

I have still some difficulties to understand how I can extract these information for each individual separately, but I will figure out what to do by looking to count_fixed_catalog_snps.py script, although it has been written for the previous version of Stack. Unless Nicholas or Julien has a solution on how to do it.

Bests,

Gabriele

daubert...@gmail.com

unread,
Oct 6, 2018, 1:13:01 PM10/6/18
to Stacks
Hi Garbiele, Hi Nicolas,
again thanks for the quick answer. I am using STACKS 1.44 since this was the version installed on our computer cluster when I started working. Now we actually have version 2.2 too (because I asked for count_fixed_catalog_snps.py which I have run successfully).
The way I understand it you both would recommend running STACKS 2.2  to extract the data from the populations.log.distribs file.
In case I want to use count_fixed_catalog_snps.py (which does not mix with newer version of STACKS if I am correct?): Could someone still please tell me how to get the number of polymorphic loci from the "old" STACKS output?
Best,
Mareike

CaffeSospeso

unread,
Oct 6, 2018, 1:23:59 PM10/6/18
to Stacks
Hi Mareike,

I never worked with the previous verisons of Stacks. I managed to find the batch.population.log file of a colleague of mine who used Stack v1.48. So, assuming that the format of the log file is the same between v1.44 and v1.48, what you should extract is the distribution that you have after "# Distribution of the number of SNPs per locus."
And do what I was suggesting in the previous comment.

Bests,

Gabriele


daubert...@gmail.com

unread,
Oct 9, 2018, 9:17:29 AM10/9/18
to Stacks
Hi Garbiele, hi Nicolas,
STACKS v 1.44 does not produce this distribution, but I have now rerun my analysis with version 2.2. Thank you both so much for your help!
Reply all
Reply to author
Forward
0 new messages