batch effect when combining data from novaseq/hiseq runs

149 views
Skip to first unread message

Oksana Vernygora

unread,
Mar 14, 2022, 5:52:07 PM3/14/22
to Stacks
Hi, 

I am running stacks on combined data from 3 ddRAD runs (one HiSeq run and two independent Novaseq runs). When samples are processed together there’s a clear batch effect with novaseq1 samples having significant amount of missing genotypes. I attached an IGV screenshot of the final filtered vcf file showing those distinct blocks of samples.
30inds_comb_stacks_igv_snapshot.png

However, when I run stacks on each group of samples separately (novaseq1, novaseq2, hiseq) with exact same parameters, this doesn't happen, e.i. novaseq1 samples genotype just fine (track_1 in the image below):
novaseq1_sepRun_igv_snapshot.png 

I have attached ref_map and gstacks.distributions files for these runs.

These are low coverage samples and I know that can cause issues. But I don't know what could be causing this batch effect when samples are combined. 

Any insight on this issue would be greatly appreciated!

Thank you,

-Oksana

ref_map_combined_30inds.log
vcf_novaseq1_10inds_0.5miss_mmDP3_0.01maf.log
ref_map_novaseq1_10inds.log
gstacks_combined_30inds.log.distribs
vcf_Alud_30inds_0.5miss_mmDP3_0.01maf.log

Victor Fitzgerald

unread,
Nov 13, 2022, 2:14:54 PM11/13/22
to Stacks
Hi Oksana,
I'm having a similar issue. Did you ever find a solution to this?
Thank you,
Victor

Oksana Vernygora

unread,
Nov 13, 2022, 6:15:27 PM11/13/22
to Stacks
Hi Victor, 
Unfortunately, I never figured what was causing this issue and how to fix it in stacks. After trying multiple options, I decided to test alternative pipelines. Processing the same data with dDocent produced uniform genotype calls without any issues. I also tried Tassel and got similar uniform results without any batch effects. 

I am not a bioinformatician to speak for the exact differences in these pipelines, just hope this helps=)

-Oksana

Reply all
Reply to author
Forward
0 new messages