demultiplexing with fastq but without barcode read fastq

420 views
Skip to first unread message

Tonja Rand

unread,
Oct 13, 2016, 9:43:19 AM10/13/16
to Qiime 1 Forum

It seems that I am missing something, so I will just describe my problem.


I have paired-end illumina reads in fastq format. In .txt I have the sequence for forward and reverse primers and tags for each experiment. I will attach an example file. The read has following format: tag-primer-fragment


I need to demultiplex the reads according to the experiment and get rid of the adapters, primers, experiment sequences. There are two scripts that could do that:



split_libraries_fastq.py - but I do not have The barcode read fastq files


demultiplex_fasta.py - it operates only on fasta format but I do not want to loose the quality information as in further I might want  to filter according to the quality.


Is there any other way I could demultiplex without losing quality information?


Thank you.

info.xlsx

Yoshiki Vázquez Baeza

unread,
Oct 14, 2016, 11:44:58 AM10/14/16
to Qiime 1 Forum
You can use split_libraries_fastq.py, note that the -b is an _optional_ option i.e. is not required. In your case if you use this script, remember to set --barcode_type as "not-barcoded".

Thanks!

Yoshiki.

Tonja Rand

unread,
Oct 16, 2016, 1:49:19 PM10/16/16
to Qiime 1 Forum
It says:

Error in split_libraries_fastq.py: If not providing barcode reads (because your data is not multiplexed), must provide --sample_ids.


In fact I do not have sample_ids either. The whole information I have is in the file I attached, namely, barcode and primer sequences

Colin Brislawn

unread,
Oct 17, 2016, 1:17:45 PM10/17/16
to Qiime 1 Forum
Hello Tonja,

Thanks for posting that excel file. It sounds like you need to make a metadata mapping file. This is a text file containing sample name, barcode, and other sample metadata, that you can use with this script. You can construct a qiime mapping file using the info in your excel file.

Take a look here:

Colin

Tonja Rand

unread,
Oct 18, 2016, 4:16:57 AM10/18/16
to qiime...@googlegroups.com
I did construct it (see attached) and used with split_libraries_fastq.py 

However, I still have the same problem as I mentioned above. I do not have barcode read fastq files and if I specify --barcode_type not-barcoded, it tells me I need to provide --sample_ids for, one id per input file path. I have only one input file path.

If I run following:

split_libraries_fastq.py  -m mapping.txt  -i Pool1_18S.fastq -o demultiplexed_output/ --barcode_type not-barcoded --sample_ids 1

I get one seqs.fna file where all reads have attached following:

orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0

So, I assume the demultiplexing did not work. Additionally, I do need quality scores after the demultiplexing and it seems to be lost after the script.


Also, if it happens that there is an error in the barcode sequence in one of the reads in the fastq file, how will demultiplex work? I thought I need to specify somewhere the distance between the true barcode and the barcode sequence in the read to allow up to so many mismatches

mapping Kopie.txt

Colin Brislawn

unread,
Oct 18, 2016, 10:54:37 AM10/18/16
to Qiime 1 Forum
Hello Tonja,

I think you are right; demultiplexing is not quite working yet. Can you help me understand what input files you have? In your command you listed Pool1_18S.fastq. Does that one file have all your 18S samples, or is that just the 'Pool1' sample? 

Are these from the Illumina MiSeq or Ion Torrent? 

Thanks for telling me more,
Colin

Reply all
Reply to author
Forward
0 new messages