demultiplexing Illumina HiSeq lanes with fastq-multx

382 views
Skip to first unread message

Piotr Grabowski

unread,
Jun 10, 2014, 12:12:57 PM6/10/14
to ea-u...@googlegroups.com
Hey,

I couldn't find a solution to my specific problem anywhere so I decided to post it here. I have 3 .fastq files from Illumina HiSeq run with 12 samples ran as multiplex. It was a paired-end run so I have 2 files with reads + 1 .fastq file with 6bp-long barcodes.

I would like to demultiplex those .fastq files and obtain SEPARATE .fastq files named accordingly to barcode-matched header.  So if I have barcodes like this:

Cell01 ACTACT
Cell02 GATTAC

Then after demultiplex I would like to see Cell01.R1.fastq + Cell01.R2.fastq and Cell02.R1.fastq + Cell02.R2.fastq


So far I can see that all the tools I've found are demultiplexing into single file, simply adding a proper tag into a read header, which is not what I am aiming for. Is it possible to perform this with fastq-multx ?


I am very grateful for any tips (maybe some different tools out there ?).



Best,
Piotr

Aronesty, Erik

unread,
Jun 10, 2014, 2:00:57 PM6/10/14
to ea-u...@googlegroups.com
That is precisely what fastq-multx does.   Try it out!

- Erik

From: ea-u...@googlegroups.com [ea-u...@googlegroups.com] on behalf of Piotr Grabowski [kajo...@gmail.com]
Sent: Tuesday, June 10, 2014 12:12 PM
To: ea-u...@googlegroups.com
Subject: demultiplexing Illumina HiSeq lanes with fastq-multx

--
You received this message because you are subscribed to the Google Groups "EA Utils" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ea-utils+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sven Klages

unread,
Jun 10, 2014, 4:15:41 PM6/10/14
to ea-u...@googlegroups.com
Just curious: why do you have/get fastq files with an extra barcode (fastq) file? It is definitely not a bad idea using Illumina's bcl converter just after the run!? Then you have your bcl data demultipeced and converted in one step ..

best,
Sven

Piotr Grabowski

unread,
Jun 10, 2014, 5:37:39 PM6/10/14
to ea-u...@googlegroups.com
Our core facility is usually taking care of this stuff and we get filtered and demultiplexed .fastq files. Here I was given a portable drive with a folder full of .qseq.txt files. The data were generated about a year ago and I have no easy access to those .bcl files, hence I have to go around using different tools, Casava 1.8 doesn't take care of those .qseq's... 


I will try with fastq-multx tomorrow morning though.

Best regards,
Piotr


--
You received this message because you are subscribed to a topic in the Google Groups "EA Utils" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ea-utils/si2HA9D6zlo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ea-utils+u...@googlegroups.com.

Piotr Grabowski

unread,
Jun 18, 2014, 6:56:24 AM6/18/14
to ea-u...@googlegroups.com
Dear Erik,


I am trying to use fastq-multx, but I am afraid I am doing something wrong.

I am using the line:

fastq-multx -m 0 -g FusionCell_L007_R2_filtered_trimmed.fastq FusionCell_L007_R1_filtered.fastq FusionCell_L007_R3_filtered.fastq -o R1.%.fastq R2.%.fastq

to demultiplex paired-end run stored in R1 and R3 files, R2 being the index lane. The run was done on Illumina HiSeq. 
The problem is that I am getting ridiculously low amounts of reads assigned to barcodes with around 90% being thrown into "unmatched.fastq" file. When I look at those unmatched reads, most of them have N's in their barcodes (added now to their header).

When I check my barcode run, I could see that I have no N's whatsoever (they were filtered). I checked the same line with "-n" option to see what barcode the script sees and he sees only the 12 required barcodes, no N's at all...
So it seems to me that those N-containing barcodes (sometimes it's just NNNNNN) is a side-effect of the script somehow. Do you have any ideas what could be happening ?


Best,
Piotr

Aronesty, Erik

unread,
Jun 20, 2014, 11:39:08 AM6/20/14
to ea-u...@googlegroups.com

1. What version are you running (fastq-multx 2>&1 | grep ersion)

 

2. For the "trimmed fastq"… how did you come by this file?   Is it just the first X bases of the original R2 file?

 

3. Can you upload, say, the first 10k or so index reads from the trimmed and untrimmed as a gzip attachment (either to the forum or to me is fine).   I don't need to see the actual data (I can mock up the R1/R3).   I'd like to see why the -g option is not doing what you want.   fastq-multx only uses a 100k reads subsample for detection… but the code explicitly skips index reads with 'N's  in them (strchr call), so it's strange that you would ever see N's in the output.

--

Reply all
Reply to author
Forward
0 new messages