Hi there,
I've got stacks up and running and have run through all of the tutorials, but now I'm trying to process some real data and I'm feeling a bit out of my depth! I'm running into some issues in the denovo_map.pl step and I think it may be a fundamental issue with the structure of my data files. Sorry for the long post, but I figured it was best to start from the beginning:
I've got 48 individuals from 4 locations that were double digested using XhoI and MseI, and run in one lane with 24 barcoded individuals and two indexes. I'm interested in looking at population structure between these locations, so I'm planning to use the populations function in stacks. I received the data from the sequencing facility as .fastq.gz files (one file per individual) after they demulitplexed the samples for us. We did paired end reads, and the paired ends are provided in a single file interleaved so that they're ordered as Read 1 of Fragment 1, Read 2 of Fragment 1, Read 1 of Fragment 2, Read 2 of Fragment 2, etc.
The first step was cleaning the data, although I did not need to demultiplex it. I used the following command:
process_radtags -p ./raw/ -o ./samples/ -i 'gzfastq' -c -q -r -E 'phred33' --disable_rad_check
I got a message back saying 'No barcodes specified, files will not be demultiplexed', which is what I wanted, and the script continued to run. I chose to --disable_rad_check mainly because XhoI is not included as an enzyme in stacks and I didn't think it was necessary to add it manually just for a RAD site check if I wasn't going to use it to demultiplex.
**Question 1: Is it a bad policy to disable the RAD site check? Should I add XhoI manually and run process_radtags with XhoI and MseI specified as a matter of good practice?
process_radtags seemed to work well, and I now have all of my files as .fastq with the following structure:
@2_1101_1868_1956_1
TTCGAGGATACCTTCAAAAGGTGATGCCTCAAAAAGGTGACATCCATTACTAAGGAAACCACCACCCCACCCCCCATCACCCAGGACACGCCC
+
EHHHHHJJJJJJJJJJJJJJJHHIJJIJJJJJJJJJJFFHIJJJJJJJJJIJIJIJJJJ;EHHHHHFFACDDDDDDDDDDDDDBDDDDDDBDD
@2_1101_1868_1956_1
TAATCAAGTAATACAGAAAGAGAAGGAAAAGTACTGAGAAAATGTTCAGGGTTCATTGCCCATTCAGAAATATGATGGCGGAGGGAAAGAAGCTGTTCCT
+
CCCFFFFFHFHHHJJJJJJIIJJJJJJJIJJHIJJJJJJJJJIJIIJIIJGGHIIJJJJJJIJJJJJJJJJJJJJJJJJHFDDDDBDDDDDDDDDDDDDC
Where I believe the first @2 is Read 1 and the second @2 (slightly longer) is the paired-end Read 2. So far so good (maybe?).
Next I was planning to build mini-contigs from the paired end reads by following the paired-end tutorial on the stacks website. However, after reading through this post I'm beginning to think that building the mini-contigs is only necessary for single-digest paired-end reads, and not for my ddRAD paired-end, as the paired ends all have a restriction cut site and therefore should stack on their own. I saw that in the tutorial there was a file for Read 1 and a separate file for Read 2 that was used to build the mini-contigs, so I'm not sure that it will work with both reads in a single file…
**Question 2: Do I need to separate my paired-end reads into different files? Or, as above, do I not need to build mini-contigs at all?
Either way, the next step would be denovo_map.pl. I tried running the following command first:
denovo_map.pl -m 3 -M 3 -T 15 -B manta_radtags -b 1 -t -D "Manta Paired-end RAD-Tags" -o ./stacks -O ./popmap -s ./samples/BB001.fastq -s ./samples/BB003.fastq -s ./samples/BB004.fastq -s ./samples/BB006.fastq -s ./samples/BB008.fastq -s ./samples/BB009.fastq -s ./samples/BB010.fastq -s ./samples/BB011.fastq -s ./samples/BB012.fastq -s ./samples/BB013.fastq -s ./samples/BB014.fastq -s ./samples/BB015.fastq -s ./samples/IN001.fastq -s ./samples/IN002.fastq -s ./samples/IN007.fastq -s ./samples/IN010.fastq -s ./samples/IN011.fastq -s ./samples/IN012.fastq -s ./samples/IN013.fastq -s ./samples/IN014.fastq -s ./samples/INA01.fastq -s ./samples/INA02.fastq -s ./samples/INA03.fastq -s ./samples/INA04.fastq -s ./samples/INA05.fastq -s ./samples/INA1M.fastq -s ./samples/RV003.fastq -s ./samples/RV004.fastq -s ./samples/RV005.fastq -s ./samples/RV006.fastq -s ./samples/RV007.fastq -s ./samples/RV008.fastq -s ./samples/RV009.fastq -s ./samples/RV010.fastq -s ./samples/RV011.fastq -s ./samples/RV012.fastq -s ./samples/SL075.fastq -s ./samples/SL080.fastq -s ./samples/SL081.fastq -s ./samples/SL083.fastq -s ./samples/SL192.fastq -s ./samples/SL218.fastq -s ./samples/SL276.fastq -s ./samples/SL277.fastq -s ./samples/SL326.fastq -s ./samples/SL392.fastq -s ./samples/SL393.fastq -s ./samples/SL395.fastq
It ran overnight, at the end of the run my terminal lit up with (HY000) at line 1: File './stacks/BB014.matches.tsv' not found (Errcode: 2 - No such file or directory), missing a ton of files including the .matches.tsv and .fst.tsv for each individual, the batch_1.markers.tsv, and the batch_1.sumstats.tsv. Furthermore, the alleles.tsv/.snps.tsv/.tags.tsv files were only present in ./stacks for about a quarter of the individuals. After reading some threads on the forum here, I think that it may have been an issue with my memory. I'm running this from a personal laptop with 8GB DDR3 memory, which is probably not sufficient.
**Question 3: Are these errors likely a memory issue? I also read on a few threads that similar issues could arise from having reads that are not equal length, and I'm wondering if perhaps my paired ends are not all the same length and are stuffing things up as they're in the same file at the Read 1's. However, they look like they're the same length in the cleaned .fastq files…
Assuming it was memory, I next tried running these in smaller batches, and skipping the population map function in denovo_map.pl with plans to come back to the stacked data with the populations function, hoping that would require less memory. I ran the following command first:
denovo_map.pl -m 3 -M 3 -T 15 -B manta_radtags -b 1 -t -D "Manta Paired-end RAD-Tags" -o ./stacks -s ./samples/BB001.fastq -s ./samples/BB003.fastq -s ./samples/BB004.fastq -s ./samples/BB006.fastq -s ./samples/BB008.fastq -s ./samples/BB009.fastq -s ./samples/BB010.fastq -s ./samples/BB011.fastq -s ./samples/BB012.fastq -s ./samples/BB013.fastq -s ./samples/BB014.fastq -s ./samples/BB015.fastq
Things seemed to go pretty well:
Found 12 sample file(s).
Identifying unique stacks; file 1 of 12 [BB001]
/usr/local/bin/ustacks -t fastq -f ./samples/BB001.fastq -o ./stacks -i 1 -m 3 -M 3 -p 15 -d -r 2>&1
…
Identifying unique stacks; file 12 of 12 [BB015]
/usr/local/bin/ustacks -t fastq -f ./samples/BB015.fastq -o ./stacks -i 12 -m 3 -M 3 -p 15 -d -r 2>&1
Loading ustacks output to manta_radtags...done.
…until the next line:
Generating catalog...
/usr/local/bin/cstacks -b 1 -o ./stacks -s ./stacks/BB001 -s ./stacks/BB003 -s ./stacks/BB004 -s ./stacks/BB006 -s ./stacks/BB008 -s ./stacks/BB009 -s ./stacks/BB010 -s ./stacks/BB011 -s ./stacks/BB012 -s ./stacks/BB013 -s ./stacks/BB014 -s ./stacks/BB015 -p 15 2>&1
Importing catalog to MySQL database...ERROR 2 (HY000) at line 1: File './stacks/batch_1.catalog.tags.tsv' not found (Errcode: 2 - No such file or directory)
ERROR 2 (HY000) at line 1: File './stacks/batch_1.catalog.snps.tsv' not found (Errcode: 2 - No such file or directory)
ERROR 2 (HY000) at line 1: File './stacks/batch_1.catalog.alleles.tsv' not found (Errcode: 2 - No such file or directory)
done.
Matching samples to catalog; file 1 of 12 [BB001]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/BB001 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...ERROR 2 (HY000) at line 1: File './stacks/BB001.matches.tsv' not found (Errcode: 2 - No such file or directory)
done.
Matching samples to catalog; file 2 of 12 [BB003]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/BB003 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...ERROR 2 (HY000) at line 1: File './stacks/BB003.matches.tsv' not found (Errcode: 2 - No such file or directory)
done.
Matching samples to catalog; file 3 of 12 [BB004]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/BB004 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...ERROR 2 (HY000) at line 1: File './stacks/BB004.matches.tsv' not found (Errcode: 2 - No such file or directory)
done.
Matching samples to catalog; file 4 of 12 [BB006]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/BB006 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...ERROR 2 (HY000) at line 1: File './stacks/BB006.matches.tsv' not found (Errcode: 2 - No such file or directory)
done.
Matching samples to catalog; file 5 of 12 [BB008]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/BB008 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...ERROR 2 (HY000) at line 1: File './stacks/BB008.matches.tsv' not found (Errcode: 2 - No such file or directory)
done.
Matching samples to catalog; file 6 of 12 [BB009]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/BB009 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...ERROR 2 (HY000) at line 1: File './stacks/BB009.matches.tsv' not found (Errcode: 2 - No such file or directory)
done.
Matching samples to catalog; file 7 of 12 [BB010]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/BB010 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...ERROR 2 (HY000) at line 1: File './stacks/BB010.matches.tsv' not found (Errcode: 2 - No such file or directory)
done.
Matching samples to catalog; file 8 of 12 [BB011]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/BB011 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...ERROR 2 (HY000) at line 1: File './stacks/BB011.matches.tsv' not found (Errcode: 2 - No such file or directory)
done.
Matching samples to catalog; file 9 of 12 [BB012]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/BB012 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...ERROR 2 (HY000) at line 1: File './stacks/BB012.matches.tsv' not found (Errcode: 2 - No such file or directory)
done.
Matching samples to catalog; file 10 of 12 [BB013]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/BB013 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...ERROR 2 (HY000) at line 1: File './stacks/BB013.matches.tsv' not found (Errcode: 2 - No such file or directory)
done.
Matching samples to catalog; file 11 of 12 [BB014]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/BB014 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...ERROR 2 (HY000) at line 1: File './stacks/BB014.matches.tsv' not found (Errcode: 2 - No such file or directory)
done.
Matching samples to catalog; file 12 of 12 [BB015]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/BB015 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...ERROR 2 (HY000) at line 1: File './stacks/BB015.matches.tsv' not found (Errcode: 2 - No such file or directory)
done.
Calculating population-level summary statistics
/usr/local/bin/populations -b 1 -P ./stacks -s -t 15 2>&1
ERROR 2 (HY000) at line 1: File './stacks/batch_1.markers.tsv' not found (Errcode: 2 - No such file or directory)
ERROR 2 (HY000) at line 1: File './stacks/batch_1.sumstats.tsv' not found (Errcode: 2 - No such file or directory)
Indexing the database...
/usr/local/bin/index_radtags.pl -D manta_radtags -t -c 2>&1
Looking at my localhost/stacks manta_radtags database, the catalog had no samples in it, so there was obviously an issue loading the samples to the database, as the error messages suggest. Furthermore, only the .alleles/.snps/.tags files are present in ./stacks for this first run. I restarted my computer and ran the next 14 samples:
denovo_map.pl -m 3 -M 3 -T 15 -B manta_radtags -b 1 -t -D "Manta Paired-end RAD-Tags" -o ./stacks -s ./samples/IN001.fastq -s ./samples/IN002.fastq -s ./samples/IN007.fastq -s ./samples/IN010.fastq -s ./samples/IN011.fastq -s ./samples/IN012.fastq -s ./samples/IN013.fastq -s ./samples/IN014.fastq -s ./samples/INA01.fastq -s ./samples/INA02.fastq -s ./samples/INA03.fastq -s ./samples/INA04.fastq -s ./samples/INA05.fastq -s ./samples/INA1M.fastq
This seemed to go a bit better, although I think that I should have changed the batch number to -b 2 instead of -b 1 based on the error messages:
Found 14 sample file(s).
ERROR 1062 (23000) at line 1: Duplicate entry '1' for key 'PRIMARY'
Identifying unique stacks; file 1 of 14 [IN001]
/usr/local/bin/ustacks -t fastq -f ./samples/IN001.fastq -o ./stacks -i 13 -m 3 -M 3 -p 15 -d -r 2>&1
…
Identifying unique stacks; file 14 of 14 [INA1M]
/usr/local/bin/ustacks -t fastq -f ./samples/INA1M.fastq -o ./stacks -i 26 -m 3 -M 3 -p 15 -d -r 2>&1
Loading ustacks output to manta_radtags...done.
Generating catalog...
/usr/local/bin/cstacks -b 1 -o ./stacks -s ./stacks/IN001 -s ./stacks/IN002 -s ./stacks/IN007 -s ./stacks/IN010 -s ./stacks/IN011 -s ./stacks/IN012 -s ./stacks/IN013 -s ./stacks/IN014 -s ./stacks/INA01 -s ./stacks/INA02 -s ./stacks/INA03 -s ./stacks/INA04 -s ./stacks/INA05 -s ./stacks/INA1M -p 15 2>&1
Importing catalog to MySQL database...done.
Matching samples to catalog; file 1 of 14 [IN001]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/IN001 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...done.
Matching samples to catalog; file 2 of 14 [IN002]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/IN002 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...done.
Matching samples to catalog; file 3 of 14 [IN007]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/IN007 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...done.
Matching samples to catalog; file 4 of 14 [IN010]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/IN010 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...done.
Matching samples to catalog; file 5 of 14 [IN011]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/IN011 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...done.
Matching samples to catalog; file 6 of 14 [IN012]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/IN012 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...done.
Matching samples to catalog; file 7 of 14 [IN013]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/IN013 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...ERROR 2 (HY000) at line 1: File './stacks/IN013.matches.tsv' not found (Errcode: 2 - No such file or directory)
done.
Matching samples to catalog; file 8 of 14 [IN014]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/IN014 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...done.
Matching samples to catalog; file 9 of 14 [INA01]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/INA01 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...done.
Matching samples to catalog; file 10 of 14 [INA02]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/INA02 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...done.
Matching samples to catalog; file 11 of 14 [INA03]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/INA03 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...ERROR 2 (HY000) at line 1: File './stacks/INA03.matches.tsv' not found (Errcode: 2 - No such file or directory)
done.
Matching samples to catalog; file 12 of 14 [INA04]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/INA04 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...done.
Matching samples to catalog; file 13 of 14 [INA05]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/INA05 -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...done.
Matching samples to catalog; file 14 of 14 [INA1M]
/usr/local/bin/sstacks -b 1 -c ./stacks/batch_1 -s ./stacks/INA1M -o ./stacks -p 15 2>&1
Loading sstacks output to manta_radtags...done.
Calculating population-level summary statistics
/usr/local/bin/populations -b 1 -P ./stacks -s -t 15 2>&1
ERROR 2 (HY000) at line 1: File './stacks/batch_1.markers.tsv' not found (Errcode: 2 - No such file or directory)
ERROR 2 (HY000) at line 1: File './stacks/batch_1.sumstats.tsv' not found (Errcode: 2 - No such file or directory)
Indexing the database...
/usr/local/bin/index_radtags.pl -D manta_radtags -t -c 2>&1
This one seemed to go a lot better, and for this round of samples I had .alleles/.snps/.tags AND .matches.tsv, except for IN013 as shown in the error message. I also have batch_1.catalog.alleles.tsv, batch_1.catalog.snps.tsv, batch_1.catalog.tags.tsv, and batch_1.populations.log, which I did not have after the first batch run. I can also now see fragments in localhost/stacks manta_radtags. However, it clearly did not go perfectly as it's still missing batch_1.markers.tsv and batch_1.sumstats.tsv.
So, in summary, I'm having issues here with denovo_map.pl. Do you think that they're purely memory issues, or is it possible that the file structure is messing things up? How would you recommend proceeding?
On a more basic note, am I approaching this analysis with the correct workflow? Is it necessary to build contigs from the paired ends, or can I just run the files with both reads through the denovo_map.pl with a population map, as the paired ends all have the same MseI cut site?
Thanks very much for your input, and sorry for the newbie questions!
Cheers,
Josh
--
Stacks website: http://creskolab.uoregon.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.