Hello,
I have paired end fastq.gz files, one forward (R1) and one reverse (R2). These files have not been demultiplexed. I would like to import them into QIIME2 for demultiplexing/trimming/qc, but I do not have a barcodes file. I have read other answers pointing to the extract_barcodes.py script. I have run this script but I am very concerned about barcode length --bc1_len. Because the jargon here I think is a little confusing, I'll clarify what I think this script is doing and where my issues lie.
I am assuming that "barcodes" in this script are used in two different ways, the index primers and the Illumina sequence identifier, mine looks like this:
@M03580:59:000000000-BHFJK:1:1101:12747:1451 1:N:0:CTCTGGTT+GTTTCCTT
So, I am thinking that the extract barcodes script is making a file with sequence identifiers and associated index primers (barcodes.fastq) and two files for R1 and R2 without the index primers?
If that is correct I am concerned about the barcode length (index primers) as my forward and reverse primers are varying lengths 5-8. When I input 8 as the bc_len, it seems that I'll get more than my actual index extracted from the file. Additionally when I try to match my actual indexes from my mapping file to the resulting barcodes.fastq output, they don't seem to match.
Any help is much appreciated!
Thank you,
Alexis