Dear,
I performed a 600-cycle sequence run on 515F-806R amplicons with a Miseq. Therefore, the demultiplexed FASTQ-files as obtained from the Miseq look like this:
- Read1: ---V4---ReversePrimer(RC)-Index(RC)
- Read2: Index(RC)---V4---ForwardPrimer(RC)
A) I suppose that I need to get rid of the primer-sequences of both reads, since they otherwise doesn't seem to stitch (with 'joined_paired_ends.py). What would be the best option to trim those sequences?
B) Why is the index-sequence present in demultplexed data from the Miseq in read2? And can I avoid this beforehand, or should I just use 'extract_barcodes.py?
Although I'm able to use 1 read within this respect or trim a given number (i.e. 12bp's at the begining & 50bp's in the end) of each sequence, this isn't that elegant.
I already have read a lot of old posts, but couldn't figure out the right solution. Anyone a suggestion?