Trimming (primer removal) of FASTQ-files

208 views
Skip to first unread message

NielsvB

unread,
Oct 26, 2016, 7:41:48 AM10/26/16
to Qiime 1 Forum
Dear,

I performed a 600-cycle sequence run on 515F-806R amplicons with a Miseq. Therefore, the demultiplexed FASTQ-files as obtained from the Miseq look like this:
- Read1: ---V4---ReversePrimer(RC)-Index(RC)
- Read2: Index(RC)---V4---ForwardPrimer(RC) 

A) I suppose that I need to get rid of the primer-sequences of both reads, since they otherwise doesn't seem to stitch (with 'joined_paired_ends.py). What would be the best option to trim those sequences?
B) Why is the index-sequence present in demultplexed data from the Miseq in read2? And can I avoid this beforehand, or should I just use 'extract_barcodes.py?

Although I'm able to use 1 read within this respect or trim a given number (i.e. 12bp's at the begining & 50bp's in the end) of each sequence, this isn't that elegant.
I already have read a lot of old posts, but couldn't figure out the right solution. Anyone a suggestion?

TonyWalters

unread,
Oct 28, 2016, 11:39:26 PM10/28/16
to Qiime 1 Forum
Hello,

I think you can still stitch the reads with the primers present. Removing the primers is a bit trickier, as you mentioned. If the positions were constant, using extract_barcodes.py to cut out regions of certain sizes from each end would suffice, but they might be variable.

There is a custom script here that might help with the primer (and regions before/after, like BC) removal: https://gist.github.com/walterst/2c592044b3b9e44a4290
Reply all
Reply to author
Forward
0 new messages