It seems that I am missing something, so I will just describe my problem.
I have paired-end illumina reads in fastq format. In .txt I have the sequence for forward and reverse primers and tags for each experiment. I will attach an example file. The read has following format: tag-primer-fragment
I need to demultiplex the reads according to the experiment and get rid of the adapters, primers, experiment sequences. There are two scripts that could do that:
split_libraries_fastq.py - but I do not have The barcode read fastq files
demultiplex_fasta.py - it operates only on fasta format but I do not want to loose the quality information as in further I might want to filter according to the quality.
Is there any other way I could demultiplex without losing quality information?
Error in split_libraries_fastq.py: If not providing barcode reads (because your data is not multiplexed), must provide --sample_ids.
In fact I do not have sample_ids either. The whole information I have is in the file I attached, namely, barcode and primer sequences
split_libraries_fastq.py -m mapping.txt -i Pool1_18S.fastq -o demultiplexed_output/ --barcode_type not-barcoded --sample_ids 1
orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0
So, I assume the demultiplexing did not work. Additionally, I do need quality scores after the demultiplexing and it seems to be lost after the script.
Also, if it happens that there is an error in the barcode sequence in one of the reads in the fastq file, how will demultiplex work? I thought I need to specify somewhere the distance between the true barcode and the barcode sequence in the read to allow up to so many mismatches