gstacks produce low mean coverage and high pcr_dupl

Bin Lu

unread,

Aug 6, 2018, 9:49:14 AM8/6/18

to Stacks

Hi Julian and all:

I ran stacks with reference genome.

gstacks -I $src/aligned/ -M $src/popmaps/popmap --rm-pcr-duplicates -O $src/stacks/ -t 8

Now the gstacks is done, but I found the value of effective per-sample coverage is very strange, and more than 90% reads are PCR duplicates.

######

Read 2706661207 BAM records:

kept 1517667519 primary alignments (60.3%), of which 748676054 reverse reads

skipped 433246269 primary alignments with insufficient mapping qualities (17.2%)

skipped 359035605 excessively soft-clipped primary alignments (14.3%)

skipped 208965987 unmapped reads (8.3%)

skipped some suboptimal (secondary/supplementary) alignment records

Per-sample stats (details in 'gstacks.log.distribs'):

read 8845298.1 records/sample (866241-30918303)

kept 20.2%-65.6% of these

Built 1491630 loci comprising 768991465 forward reads and 439981014 matching paired-end reads; mean insert length was 428.1 (sd: 80.7).

Removed 346943658 unpaired (forward) reads (45.1%); kept 422047807 read pairs in 1085633 loci.

Removed 397435243 read pairs whose insert length had already been seen in the same sample as putative PCR duplicates (94.2%); kept 24612564 read pairs.

Genotyped 1085633 loci:

effective per-sample coverage: mean=-nanx, stdev=-nanx, min=1.0x, max=1.6x

mean number of sites per locus: 317.6

a consistent phasing was found for 7738959 of out 7740994 (100.0%) diploid loci needing phasing

gstacks is done.

#####

sample n_loci n_used_fw_reads mean_cov mean_cov_ns n_unpaired_reads n_pcr_dupl_pairs pcr_dupl_rate

AD1 73865 79832 1.081 1.116 1282589 1279843 0.941

AD10 63032 67965 1.078 1.108 540636 751440 0.917

AD2 66453 74872 1.127 1.173 1189848 1286884 0.945

AD3 64606 69845 1.081 1.113 950642 1095325 0.940

Why? How can I fix this problem?

Thanks

Best

Bin Lu

gstacks.log

gstacks.log.distribs

Bin Lu

unread,

Aug 7, 2018, 8:25:18 AM8/7/18

to Stacks

Hi Julian and all:

There are some information I forget to say here. My data type is ddRad. So should I not use the --rm-pcr-duplicates which removed almost all the reads after gstacks? In addition, I found the ref_map.pl not use the --rm-pcr-duplicates.

Thanks

Best

Bin

在 2018年8月6日星期一 UTC+8下午9:49:14，Bin Lu写道：

Julian Catchen

unread,

Aug 7, 2018, 5:07:45 PM8/7/18

to stacks...@googlegroups.com, Bin Lu

Hi Bin,

You cannot use the PCR duplicates filter on double digest data. The two
enzyme cut sites spoil the algorithm for detecting duplicates (identical
alignment of pairs of reads). The only option for handling PCR
duplicates in ddRAD is to use a random oligo in your P1 adaptor, like as
is done in the 3RAD protocol.

julian

Bin Lu wrote on 8/7/18 8:25 AM:

> --
> Stacks website: http://catchenlab.life.illinois.edu/stacks/
> ---
> You received this message because you are subscribed to the Google
> Groups "Stacks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to stacks-users...@googlegroups.com
> <mailto:stacks-users...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/stacks-users/df18e936-406b-407a-a9df-e125ea7a1278%40googlegroups.com
> <https://groups.google.com/d/msgid/stacks-users/df18e936-406b-407a-a9df-e125ea7a1278%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Julian Catchen, Ph.D.
Assistant Professor
Department of Animal Biology
University of Illinois, Urbana-Champaign
--
jcat...@illinois.edu; @jcatchen

Bin Lu

unread,

Aug 7, 2018, 8:37:20 PM8/7/18

to Stacks

Hi Julian,

Thanks for your reply. So I just need to remove --rm-pcr-duplicates in my command line, right?

And can I directly use 'ref_map.pl -T 8 --popmap ./popmaps/popmap -o ./stacks/ --samples ./aligned' for ddrad? Because there seem to no '--rm-pcr-duplicates' option.

For other rad with one enzyme cut site, we alway need '--rm-pcr-duplicates' option, right?

Best

Bin

在 2018年8月8日星期三 UTC+8上午5:07:45，Julian Catchen写道：

> <mailto:stacks-users+unsub...@googlegroups.com>.

Julian Catchen

unread,

Aug 8, 2018, 11:55:40 AM8/8/18

to stacks...@googlegroups.com, Bin Lu

Yes, you are correct. It is optional to look at PCR duplicates in single
enzyme protocols, but always a good idea. -julian

Bin Lu wrote on 8/7/18 8:37 PM:

> <http://ref_map.pl> not use the

Reply all

Reply to author

Forward

gstacks produce low mean coverage and high pcr_dupl_rate

Bin Lu

Bin Lu

Julian Catchen

Bin Lu

Julian Catchen