Demultiplexing results in R1 R2 different number of reads FASTQ files

774 views
Skip to first unread message

Guillermo Marco Puche

unread,
Jan 24, 2014, 3:38:45 PM1/24/14
to qiime...@googlegroups.com
Hello,

I'm trying to analyze some weird MiSeq data (at least for me). 

Data consist in 3 FASTQ files: R1 (reads), R2 (barcodes), R3 (reads)

Barcodes have 8 length (see example below):

@MISEQ:154:000000000-A4PK8:1:1101:15615:1331 2:N:0:
ATCACGAN
+
AABBBBA#

I've demultiplexed data using split_libraries_fastq.py with the following commands:

split_libraries_fastq.py -i lane1_NoIndex_L001_R1_001.fastq -b lane1_NoIndex_L001_R2_001.fastq -o r1_demux_1mm -m map_corrected.txt --barcode_type 8 --max_barcode_errors 0 --store_demultiplexed_fastq
split_libraries_fastq.py -i lane1_NoIndex_L001_R3_001.fastq -b lane1_NoIndex_L001_R2_001.fastq -o r3_demux_1mm -m map_corrected.txt --barcode_type 8 --max_barcode_errors 0 --store_demultiplexed_fastq

I've tried playing with  --max_barcode_errors option but I always get two FASTQ files that differ in number of lines.

I've to mention that my map.txt file is correct, I have barcodes but no primer sequences:

#SampleID       BarcodeSequence LinkerPrimerSequence    Description
sample1   (tab)   CGTGATAT   (tab)   (empty column here) (tab)         none

I would really appreciate some help.

Thank you very much ^^






Jai Ram Rideout

unread,
Jan 24, 2014, 4:12:00 PM1/24/14
to qiime...@googlegroups.com
Hi Guillermo,

You're likely getting different numbers of demultiplexed sequences because different sequences are being quality filtered out during the two runs. Can you please check the split_library_log.txt file that gets created in each output directory? That will tell you the reason(s) that sequences were filtered out, as well as how many.

Alternatively, if you're using QIIME 1.8.0, you might try running join_paired_ends.py first, followed by split_libraries_fastq.py.

-Jai


--
 
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Message has been deleted

Guillermo Marco Puche

unread,
Jan 24, 2014, 4:27:55 PM1/24/14
to qiime...@googlegroups.com
Here we go with both logs.
r1.log
r3.log

Jai Ram Rideout

unread,
Jan 24, 2014, 4:37:04 PM1/24/14
to qiime...@googlegroups.com
In r1.log:

Read too short after quality truncation: 387628
Count of N characters exceeds limit: 111

In r3.log:

Read too short after quality truncation: 387712
Count of N characters exceeds limit: 55

This is why your demultiplexed sequences have different counts across the two runs.

-Jai

Guillermo Marco Puche

unread,
Jan 24, 2014, 5:12:21 PM1/24/14
to qiime...@googlegroups.com
I need both FASTQ files to have same number of paired reads to align with BWA or Bowtie2.
There's any solution to filter both files and get only common reads? Does Qiime provide any tool that could help me further?

Jai Ram Rideout

unread,
Jan 25, 2014, 5:08:53 PM1/25/14
to qiime...@googlegroups.com
Hello,

Unfortunately, I don't think QIIME can do this type of filtering. If you're only interested in demultiplexing your sequences (and not quality filtering them), you could try relaxing/disabling the quality filters in split_libraries_fastq.py so that the same sequences are kept for each of the paired ends. Otherwise, you'll probably need to write a short script to do the filtering that you need.

Hope this helps,
Jai

Guillermo Marco Puche

unread,
Jan 26, 2014, 5:38:50 AM1/26/14
to qiime...@googlegroups.com
Hello Jai,

What's the correct way to disable quality filters in split_libraries_fastq.py?

Thanks.

Tony Walters

unread,
Jan 26, 2014, 12:49:48 PM1/26/14
to qiime...@googlegroups.com
Guillermo,

Jai was pointing to relaxing the quality filters-you can't disable them altogether, but you would probably set them to such a level that they wouldn't apply to the reads. In this case, you want to look at 1. lowering -p, 2. increasing -n, and 3. increasing -r. This approach, if successful, will mean that you have a lot of low quality reads in your data, so keep that in mind for any downstream processing.

You might also take a look at some of the earlier workarounds for filtering fastq files to make them match before the paired-end assembly script was added to QIIME: https://groups.google.com/forum/#!topic/qiime-forum/CO9EmR4FH58
These were being used to match up the barcodes fastq file to the stitched reads, but you could potentially use the same approach to filter out reads that do not match between your forward and reverse reads. One difference you will need to take into account is that the previous filtering is using one fastq file (stitched reads) that was a subset of the barcodes reads. Since you will likely have some reads in the forward sequences that aren't in the reverse reads and visa versa, you'll need to do filtering on both the forward and reverse reads to make the final results match up.

Guillermo Marco Puche

unread,
Jan 26, 2014, 4:33:28 PM1/26/14
to qiime...@googlegroups.com
Hello Tony,

I know I'll get a lot of low quality reads in my data. But I'm filtering them in my pipeline afterwards, so It would be no problem.

I'm gonna try to playe around p, n and r options and see what happens. My only objective here is to demultiplex files without any kind of filtering. I'll process this data later..
Reply all
Reply to author
Forward
0 new messages