stacks with double digest data

315 views
Skip to first unread message

mikel...@gmail.com

unread,
May 18, 2014, 9:56:52 AM5/18/14
to stacks...@googlegroups.com
Hi all,
I am using stacks with a batch of fastq files obtained by double digest.
Before use stacks, I have merged my reverse and forward reads and I used this data with denovo_map.pl.
Then, I have obtained the vcf file from stacks and the data seems not to be correct.

I am using my data correctly? Can I use stacks with my data?

Thanks,

Alicia Mastretta

unread,
May 19, 2014, 4:54:10 AM5/19/14
to stacks...@googlegroups.com
Have you demultiplexed it? What error are you getting?

mikel...@gmail.com

unread,
May 19, 2014, 10:36:51 PM5/19/14
to stacks...@googlegroups.com
Hi Alicia,

Yes, data is demultiplexed. For each sample, I have two files:
sampleX_1.fq and sampleX_2.fq
And then I merged these files.

The problem is that plotting pairwise euclidean distance shows some vertical displacement groupings. If I use these fq files with other programs like gatk/freebayes I don't have these displacements, I obtain just a group.

Before to do this, I had tried inserting the fq files without merged them obtaining similar problems.

Thank you,







El dilluns 19 de maig de 2014 17:54:10 UTC+9, Alicia Mastretta va escriure:

Julian Catchen

unread,
May 19, 2014, 11:55:17 PM5/19/14
to stacks...@googlegroups.com, mikel...@gmail.com
Hi,

You really need to be specific about your analysis. What sort of data is this? What do you mean by "merge", are you physically concatenating the two files together, or merging the single-end/paired-end sequences together? What does it mean to plot "pairwise euclidean distance" and what are "vertical displacement groupings"? Gatk and freebayes are both SNP callers, how did you run these programs, did you use some other clustering software? What parameters did you choose with denovo_map.pl?

mikel...@gmail.com

unread,
May 20, 2014, 1:47:20 AM5/20/14
to stacks...@googlegroups.com, mikel...@gmail.com, jcat...@uoregon.edu
Hi Julian,
Sorry for the confusions. When I say 'merge' I just refer to concatenate the two files together, not merging the single-end/paired-end sequences together.
My data are obtained by ddRADseq and then demultiplexed, fq files. My parameters using denovo_map.pl are:
denovo_map.pl -T 1 -b 1 -i 1 -D "xxx" -S -o out/ -s sampleA.fq -s sampleB.fq -X "populations:--vcf"

I compare the output vcf file from stacks with a vcf file from gatk (or freebayes) and I plot pairwise distances between them and I obtain a plot with several groups, with a vertical displacement between them. If I compare the vcf files from gatk and freebayes (in terms of pairwise distances) I obtain a single group instead. Maybe there is something wrong with stacks process?

I have tried changing the order of the input files when I use denovo_map.pl and the output vcf files are highly correlated but not identical in terms of pairwise distances.

Thank you,
 

El dimarts 20 de maig de 2014 12:55:17 UTC+9, Julian Catchen va escriure:

mikel...@gmail.com

unread,
May 20, 2014, 7:22:14 AM5/20/14
to stacks...@googlegroups.com, mikel...@gmail.com, jcat...@uoregon.edu
Hi Julian,
I attach the pairwise distances plots.
Thanks,
1.jpg

mikel...@gmail.com

unread,
May 22, 2014, 1:37:43 AM5/22/14
to stacks...@googlegroups.com, mikel...@gmail.com, jcat...@uoregon.edu
Hi Julian,
The statistic is parwise euclidean distance between genotypes of individuals, computed over all loci that have data for both individuals.  1 being all the same, 0 is all different.

Regards,



El dimarts 20 de maig de 2014 20:22:14 UTC+9, mikel...@gmail.com va escriure:

Julian Catchen

unread,
May 22, 2014, 2:14:05 AM5/22/14
to stacks...@googlegroups.com, mikel...@gmail.com
Hi,

I would guess that you need to allow for more mismatches in your denovo_map.pl run. I would increase -M and -n above the default values you are using.

Did you clean/demultiplex your data using process_radtags or with another method?

julian
Reply all
Reply to author
Forward
0 new messages