--model options in gstacks v2.0

Natalie B

unread,

May 17, 2018, 3:09:31 AM5/17/18

to Stacks

Hi,

I’m quite new to Stacks, using v2.0 and I’m trying to understand the new --model option setting in gstacks.

--model — model to use to call variants and genotypes; one of marukilow (default), marukihigh, or snp

I think 'marukilow' would be used for datasets with low coverage, 'marukihigh' for high coverage, but I’m not sure what the ‘snp’ option would be used for?

Also how can I determine the level of coverage in my dataset? I have 531 samples, so I know there is a ustacks log which has the level of coverage for each sample, but I’m not sure if there is a way of obtaining the level of coverage for the whole dataset?

Thank you for your help

Natalie

Julian Catchen

unread,

May 17, 2018, 11:12:12 AM5/17/18

to stacks...@googlegroups.com, Natalie B

Hi Natalie,

For most datasets, we would recommend the default 'marukilow' model, it
takes a Bayesian approach (incorporating information about allele
frequencies of the population at each site) and works quite well. The
'marukihigh' model can call more than two alleles per site, but does not
use a Bayesian approach. The 'snp' model is the model by Hohenlohe and
collaborators and it was the default model in Stacks v1. I recommend
Maruki + Lynch's paper to fully understand the new models:

Maruki, T., & Lynch, M. (2017). Genotype Calling from Population-Genomic
Sequencing Data. G3: Genes|Genomes|Genetics, 7(5), 1393–1404.

To determine the level of coverage in your dataset, just average the
coverage of each individual as reported by ustacks (or denovo_map.pl
will create a table of all coverages for the dataset).

Best,

julian

Natalie B wrote:
> Hi,
>
> I’m quite new to Stacks, using v2.0 and I’m trying to understand the new
> --model option setting in gstacks.
>
> --model — model to use to call variants and genotypes; one of marukilow
> (default), marukihigh, or snp
>
> I think 'marukilow' would be used for datasets with low coverage,
> 'marukihigh' for high coverage, but I’m not sure what the ‘snp’ option
> would be used for?
>

> Also how can I determine the level of coverage in my dataset?I have 531

Natalie B

unread,

May 17, 2018, 9:05:25 PM5/17/18

to Stacks

Hi Julian,

Ok great, thank you for responding so quickly, much appreciated.

Thanks
Natalie

Noemie Valenza-troubat

unread,

Jun 8, 2018, 12:06:43 AM6/8/18

to Stacks

Hi Julian,

I just ran the pipeline aligning my data against a reference genome and I can not see anywhere in the population output the coverage per loci per individual. Where can I find this information? I know the -m parameter doesn't exist anymore but I could not find any similar parameter to specify in gstacks.

Thanks for your help!

Noemie

Julian Catchen

unread,

Jun 13, 2018, 6:24:31 PM6/13/18

to stacks...@googlegroups.com, no.va...@gmail.com

Hi Noemie,

The per sample mean locus coverage is reported by gstacks in the
*.distribs file.

Best,

julian

Noemie Valenza-troubat wrote:
> Hi Julian,
>
> I just ran the pipeline aligning my data against a reference genome and
> I can not see anywhere in the population output the coverage per loci
> per individual. Where can I find this information? I know the -m
> parameter doesn't exist anymore but I could not find any similar
> parameter to specify in gstacks.
>
> Thanks for your help!
>
> Noemie
>
> Le jeudi 17 mai 2018 19:09:31 UTC+12, Natalie B a écrit :
>
> Hi,
>
> I’m quite new to Stacks, using v2.0 and I’m trying to understand the
> new --model option setting in gstacks.
>
> --model — model to use to call variants and genotypes; one of
> marukilow (default), marukihigh, or snp
>
> I think 'marukilow' would be used for datasets with low coverage,
> 'marukihigh' for high coverage, but I’m not sure what the ‘snp’
> option would be used for?
>

> Also how can I determine the level of coverage in my dataset?I have

Reply all

Reply to author

Forward