Trimming (primer removal) of FASTQ-files

208 views

Skip to first unread message

NielsvB

unread,

Oct 26, 2016, 7:41:48 AM10/26/16

to Qiime 1 Forum

Dear,

I performed a 600-cycle sequence run on 515F-806R amplicons with a Miseq. Therefore, the demultiplexed FASTQ-files as obtained from the Miseq look like this:

- Read1: ---V4---ReversePrimer(RC)-Index(RC)

- Read2: Index(RC)---V4---ForwardPrimer(RC)

A) I suppose that I need to get rid of the primer-sequences of both reads, since they otherwise doesn't seem to stitch (with 'joined_paired_ends.py). What would be the best option to trim those sequences?

B) Why is the index-sequence present in demultplexed data from the Miseq in read2? And can I avoid this beforehand, or should I just use 'extract_barcodes.py?

Although I'm able to use 1 read within this respect or trim a given number (i.e. 12bp's at the begining & 50bp's in the end) of each sequence, this isn't that elegant.

I already have read a lot of old posts, but couldn't figure out the right solution. Anyone a suggestion?

TonyWalters

unread,

Oct 28, 2016, 11:39:26 PM10/28/16

to Qiime 1 Forum

Hello,

I think you can still stitch the reads with the primers present. Removing the primers is a bit trickier, as you mentioned. If the positions were constant, using extract_barcodes.py to cut out regions of certain sizes from each end would suffice, but they might be variable.

There is a custom script here that might help with the primer (and regions before/after, like BC) removal: https://gist.github.com/walterst/2c592044b3b9e44a4290

Reply all

Reply to author

Forward

0 new messages