2 lanes, same samples

Jeff

unread,

Sep 19, 2013, 10:23:02 AM9/19/13

to stacks...@googlegroups.com

Hello Julian et al.,

Quick question. I ran a set of samples for a pilot study and was able to process this with stacks pretty easily, but probably overmultiplexed a little for my taste and wanted to see what happens if you increase the coverage per sample (how much better is SNP calling, etc) so I ran the library through the sequencer again.

So I now have 2 fastq files, 1 for the first batch and 1 for the second run (they are paired end, but I'm treating them as single end sequences for now while I learn the basics). They contain essentially the same data, barcodes for each individual are identical for each file, etc.

Can I run both Illumina files through process_radtags together, since the barcodes are shared, to generate one cleaned up set of fq's, or should I do this independently, and generate 2 files for each barcode and combine them downstream?

If the former, that's probably easiest, but if the latter, is there a way to do this within a downstream component of stacks by feeding both files, or is it better to do it by hand? Which tool might be recommend for combining a large number of files like this (I can figure out how to use it, but I'm new to UNIX etc and just need to be pointed in the right direction, sometimes!)

Thanks very much for your time

Jeff

Julian Catchen

unread,

Sep 19, 2013, 2:48:45 PM9/19/13

to stacks...@googlegroups.com, ihate...@gmail.com

Hi Jeff,

You can do it either way: either place both sets of raw data files into a single
directory and get a single output file, or run process_radtags twice and
concatenate the output files together.

There is no special tool to concatenate files together as far as I know, I
simply use the cat command, as in:

cat file1 >> file2

If you have a lot of files, you can put it in a shell loop, something like this:

ls -1 *.fq | while read line; do cat ../firstrun/$line >> ./$line; done

Best,

julian

Jeff

unread,

Sep 20, 2013, 8:36:24 AM9/20/13

to stacks...@googlegroups.com, ihate...@gmail.com, jcat...@uoregon.edu

Thanks Julian, that's kind of what my experiments suggested, but wanted to make sure I wasn't doing anything improper.

Now for a second question, relating to the same data set! I was looking through it yesterday, and it appears that for the second run, but not the first, the core facility trimmed out adapter contamination, resulting in ragged length reads (there was obviously some "leakage" of small fragments during size selection...again, pilot study!). These don't appear to represent good RAD loci, in the sense that they appear kind of rare and I can't find many duplicate loci in the files, so I'd really just like to get rid of them.

It's annoying because the adapter ID flag in process_radtags does a good job on the first file, and I am trimming all the reads using -t to cut out any partial adapter sequence at 3', but of course there are some even shorter contaminant reads (process RADtags dumps those in file 1, but not file 2 because the adapter isn't there any more!). Any advice on a tool to strip those from the file? Would it be a good idea to just run fastx toolkit on the fastq first? If so, what quality score does STACKS filter, so I can set the fastx threshold lower just to dump the short reads but retain as much as possible for the process_radtags filter?

Thanks so much, especially for script examples, quite helpful!!

J

Julian Catchen

unread,

Sep 21, 2013, 5:19:18 PM9/21/13

to stacks...@googlegroups.com, Jeff

Hi Jeff,

I'm not sure I follow your third paragraph. Both of your two runs had adapter
contamination but in the second run it has been filtered out already?

So, is the problem that there are some fragments left over in the second run
that you want to get rid of? So, say your reads started at 100bp in length, and
you want process_radtags to trim them to 75bp? But, some of the reads are
already less than 75bp and are not being thrown out?

You can certainly run FASTX toolkit first, but what would you like to accomplish
with FASTX that process_radtags isn't handling?

The default quality threshold in process_radtags is a phred score of 10.

Best,

julian

Reply all

Reply to author

Forward