gstacks produce low mean coverage and high pcr_dupl_rate

567 views
Skip to first unread message

Bin Lu

unread,
Aug 6, 2018, 9:49:14 AM8/6/18
to Stacks
Hi Julian and all:

I ran stacks with reference genome. 
gstacks -I $src/aligned/ -M $src/popmaps/popmap --rm-pcr-duplicates -O $src/stacks/ -t 8

Now the gstacks is done, but I found the value of effective per-sample coverage is very strange, and more than 90% reads are PCR duplicates.

######
Read 2706661207 BAM records:
  kept 1517667519 primary alignments (60.3%), of which 748676054 reverse reads
  skipped 433246269 primary alignments with insufficient mapping qualities (17.2%)
  skipped 359035605 excessively soft-clipped primary alignments (14.3%)
  skipped 208965987 unmapped reads (8.3%)
  skipped some suboptimal (secondary/supplementary) alignment records

  Per-sample stats (details in 'gstacks.log.distribs'):
    read 8845298.1 records/sample (866241-30918303)
    kept 20.2%-65.6% of these

Built 1491630 loci comprising 768991465 forward reads and 439981014 matching paired-end reads; mean insert length was 428.1 (sd: 80.7).
Removed 346943658 unpaired (forward) reads (45.1%); kept 422047807 read pairs in 1085633 loci.
Removed 397435243 read pairs whose insert length had already been seen in the same sample as putative PCR duplicates (94.2%); kept 24612564 read pairs.

Genotyped 1085633 loci:
  effective per-sample coverage: mean=-nanx, stdev=-nanx, min=1.0x, max=1.6x
  mean number of sites per locus: 317.6
  a consistent phasing was found for 7738959 of out 7740994 (100.0%) diploid loci needing phasing

gstacks is done.
#####
sample n_loci n_used_fw_reads mean_cov mean_cov_ns n_unpaired_reads n_pcr_dupl_pairs pcr_dupl_rate
AD1 73865 79832 1.081 1.116 1282589 1279843 0.941
AD10 63032 67965 1.078 1.108 540636 751440 0.917
AD2 66453 74872 1.127 1.173 1189848 1286884 0.945
AD3 64606 69845 1.081 1.113 950642 1095325 0.940


Why? How can I fix this problem?

Thanks

Best

Bin Lu




gstacks.log
gstacks.log.distribs

Bin Lu

unread,
Aug 7, 2018, 8:25:18 AM8/7/18
to Stacks
Hi Julian and all:

There are some information I forget to say here. My data type is ddRad. So should I not use the --rm-pcr-duplicates which removed almost all the reads after gstacks? In addition, I found the ref_map.pl not use the --rm-pcr-duplicates.

Thanks
Best
Bin

在 2018年8月6日星期一 UTC+8下午9:49:14,Bin Lu写道:

Julian Catchen

unread,
Aug 7, 2018, 5:07:45 PM8/7/18
to stacks...@googlegroups.com, Bin Lu
Hi Bin,

You cannot use the PCR duplicates filter on double digest data. The two
enzyme cut sites spoil the algorithm for detecting duplicates (identical
alignment of pairs of reads). The only option for handling PCR
duplicates in ddRAD is to use a random oligo in your P1 adaptor, like as
is done in the 3RAD protocol.

julian

Bin Lu wrote on 8/7/18 8:25 AM:
> --
> Stacks website: http://catchenlab.life.illinois.edu/stacks/
> ---
> You received this message because you are subscribed to the Google
> Groups "Stacks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to stacks-users...@googlegroups.com
> <mailto:stacks-users...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/stacks-users/df18e936-406b-407a-a9df-e125ea7a1278%40googlegroups.com
> <https://groups.google.com/d/msgid/stacks-users/df18e936-406b-407a-a9df-e125ea7a1278%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.


--
Julian Catchen, Ph.D.
Assistant Professor
Department of Animal Biology
University of Illinois, Urbana-Champaign
--
jcat...@illinois.edu; @jcatchen

Bin Lu

unread,
Aug 7, 2018, 8:37:20 PM8/7/18
to Stacks
Hi Julian,

Thanks for your reply. So I just need to remove --rm-pcr-duplicates in my command line, right?

And can I directly use 'ref_map.pl -T 8 --popmap ./popmaps/popmap -o ./stacks/ --samples ./aligned' for ddrad? Because there seem to no '--rm-pcr-duplicates' option.

For other rad with one enzyme cut site, we alway need '--rm-pcr-duplicates' option, right?

Best
Bin

在 2018年8月8日星期三 UTC+8上午5:07:45,Julian Catchen写道:

Julian Catchen

unread,
Aug 8, 2018, 11:55:40 AM8/8/18
to stacks...@googlegroups.com, Bin Lu
Yes, you are correct. It is optional to look at PCR duplicates in single
enzyme protocols, but always a good idea. -julian

Bin Lu wrote on 8/7/18 8:37 PM:
> <http://ref_map.pl> not use the
Reply all
Reply to author
Forward
0 new messages