fastq-multx

570 views
Skip to first unread message

Orla O'Sullivan

unread,
May 30, 2014, 7:27:52 AM5/30/14
to ea-u...@googlegroups.com
Hi
I am wondering if fastq-multx can be used for dual indexing.
I have two index files and two read files.
 
I have ran the forward and reverse reads seperately but is there a way to do it simutaneously. so I get 1 output file per index pair.
 
Thanks
 
Orla
 
 
 

Aronesty, Erik

unread,
May 30, 2014, 10:22:54 AM5/30/14
to ea-u...@googlegroups.com

Yes, we use it for dual-indexing all the time.   You can use -l , -L or -B modes for dual indexes, -g will not work. 

 

EXAMPLE (r1 is "read 1" from illumina):

 

fastq-multx -l dual_barcodes.txt r2.gz r3.gz r1.gz r4.gz -o n/a -o n/a -o r1.%.fastq.gz -o r2.gz

 

EXAMPLE of dual index file:

 

SampleID  Index Style

D701_501  ATTACTCG-TATAGCCT     Nextera

D701_502  ATTACTCG-ATAGAGGC     Nextera

D701_503  ATTACTCG-CCTATCCT     Nextera

D701_504  ATTACTCG-GGCTCTGA     Nextera

D701_505  ATTACTCG-AGGCGAAG     Nextera

D701_506  ATTACTCG-TAATCTTA     Nextera

D701_507  ATTACTCG-CAGGACGT     Nextera

D701_508  ATTACTCG-GTACTGAC     Nextera

D702_501  TCCGGAGA-TATAGCCT     Nextera

--
You received this message because you are subscribed to the Google Groups "EA Utils" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ea-utils+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Orla O'Sullivan

unread,
May 30, 2014, 10:53:49 AM5/30/14
to ea-u...@googlegroups.com
Thanks Eric I had seen that. The problem is that the indexs are already removed from the Read files so have no way of recognising them through the barcodes
 
 
Orla

Aronesty, Erik

unread,
May 30, 2014, 11:19:50 AM5/30/14
to ea-u...@googlegroups.com

I'm not sure what you mean by "removed from the read files".   Can you paste in an example (1 row from each file you have).

Orla O'Sullivan

unread,
May 30, 2014, 11:34:26 AM5/30/14
to ea-u...@googlegroups.com
Thank you for being so helpful. AS you can see the index "CCCTCTTT" is no longer at the start of the read file.
 
@M01385:25:000000000-A6GWW:1:1101:16207:1221 1:N:0:0
CCCTCTTT
+
6,,6,,,,
@M01385:25:000000000-A6GWW:1:1101:16207:1221 1:N:0:0
CCTCCTTTTTTCTTCCTTTCTTCCTCTTTCTCCCTTTTTTCTCTTCTTCTTCCTCTCTTTCTCTTTCTTTCTTCCTTCTTTCTTTTTTTTCCTTTTTTTTTTTTTTTTCTTTTCTTTCTTTTCTTTTCTCTTTTTTCCTTTCTCTTTTTTTTCCTTCTTTTTTTTTTCTTTTCCTTTTTTTTCTTTTTT
CCTTTTTTTCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+
68A@BEDD9CFF6CECFCE,CF<CC6CEC,6<CCB;B,,@,,<66CCE<6,,;CC,;,C6<CEDDE,6@E;5A5<CD,66E:CECCDEEC96,,99<=C@C:4+4>C:4;6,,,7?4,5,754,,5=,33;==;@+,6,3=,+,3,33,+3:322;0**33*4*11:2+4,5+08;;9:))//<@98*7
))57)3;17,)39<68C6..).772,23(777<<<>9(,(-2211(24):38>88888821(:8888885(-(,-316775775755))/-17755.888888655555888

Aronesty, Erik

unread,
May 30, 2014, 11:51:56 AM5/30/14
to ea-u...@googlegroups.com

That is only 2 files.   When you say dual-indexed, do you mean Nextera-like?   If not, then I said the wrong thing.

Orla O'Sullivan

unread,
May 30, 2014, 11:55:49 AM5/30/14
to ea-u...@googlegroups.com
Sorry the yes the Nextera dual indexing
the other two files are
@M01385:25:000000000-A6GWW:1:1101:16207:1221 2:N:0:0
CTTTCCCT
+
-,66,5,6
 
@M01385:25:000000000-A6GWW:1:1101:16207:1221 2:N:0:0
TCCTCCTTTTTTCTCTCCTCCTTTTTTCTCCCCTCTCTTTCTTTCCTCCTTTTCCTTTTTTTTCTCTTCTTCTTCCTTCTCCCTTTTTTTTCTTCCTTCTCTCTCCTCCTTTCCCCTCTCCCCTTTTTCTTCCTCTTTCCTCTTCTTTCCTCTCTTTCTCCATTTTTCTCTTCCTTTTTTTTTTTTTTC
CCCTTTTTTTCCCTTTTTTTTTTTTTTTTTTTCTTCTCTCCCTTTTCTCCCTTTTCTTCTTTTTTTTTTTTTTTTTTTTCTTTTTCGCTGTTCTTCTTTTCTTTTTTTTTTT
+
8,886;6,=CC@6C,50;5;4;=AE@@=A5<,,,86AABCCC,;@,;6;4,<?0*;;@6,+64,4396,,9,99<>>BDB=,493,65>>C86=4,4,+949==5)613:D;0*,,98153,+0*05+03;;9=9992++336+*21,*1;:95:+**++3066=9+*+*3;0*36:3*1)0:5;:*1,
.*)-0)*052(*.*,,22**481*/*//**,(57**.-4)((.-3))))).(),))--)))0+0***,(,-((,**((((.-).4(((((,))).)--.)))-4),23(,((


--
You received this message because you are subscribed to a topic in the Google Groups "EA Utils" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ea-utils/Oc7fdmLs_DY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ea-utils+u...@googlegroups.com.

Aronesty, Erik

unread,
May 30, 2014, 2:07:48 PM5/30/14
to ea-u...@googlegroups.com

 

Unless the barcodes are interleaved somehow (not sure from how you pasted it) the command line below should work fine.

 

fastq-multx does not require the barcode to be "ligated" to the read.

 

R2 is

 

fastq-multx -l dual_barcodes.txt r2.gz r3.gz r1.gz r4.gz -o n/a -o n/a -o r1.%.fastq.gz -o r2.gz

 

FILE R2

@M01385:25:000000000-A6GWW:1:1101:16207:1221 1:N:0:0
CCCTCTTT
+
6,,6,,,,

 

FILE R3

@M01385:25:000000000-A6GWW:1:1101:16207:1221 2:N:0:0
CTTTCCCT
+
-,66,5,6

 

 

FILE R1:

 

@M01385:25:000000000-A6GWW:1:1101:16207:1221 1:N:0:0
CCTCCTTTTTTCTTCCTTTCTTCCTCTTTCTCCCTTTTTTCTCTTCTTCTTCCTCTCTTTCTCTTTCTTTCTTCCTTCTTTCTTTTTTTTCCTTTTTTTTTTTTTTTTCTTTTCTTTCTTTTCTTTTCTCTTTTTTCCTTTCTCTTTTTTTTCCTTCTTTTTTTTTTCTTTTCCTTTTTTTTCTTTTTT
CCTTTTTTTCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+
68A@BEDD9CFF6CECFCE,CF<CC6CEC,6<CCB;B,,@,,<66CCE<6,,;CC,;,C6<CEDDE,6@E;5A5<CD,66E:CECCDEEC96,,99<=C@C:4+4>C:4;6,,,7?4,5,754,,5=,33;==;@+,6,3=,+,3,33,+3:322;0**33*4*11:2+4,5+08;;9:))//<@98*7
))57)3;17,)39<68C6..).772,23(777<<<>9(,(-2211(24):38>88888821(:8888885(-(,-316775775755))/-17755.888888655555888

 

Orla O'Sullivan

unread,
Jun 3, 2014, 4:30:33 AM6/3/14
to ea-u...@googlegroups.com
Apologies for the late reply we had a public holiday here in Ireland. Thanks for the advice.. I will try that now

alk...@nau.edu

unread,
Aug 20, 2014, 8:24:09 PM8/20/14
to ea-u...@googlegroups.com
I think I am doing everything right, yet I keep not getting things to work.  I have dual indexed reads from a miseq (4 reads in this order: read1, index1, index2, read2).  Here are the first 4 lines from index1 (normal fastq output from miseq, phred33 coding):

@M01315:84:000000000-AA888:1:1101:17152:1632 1:N:0:0
GTTAAATC
+
>11>133B

Here is the contents of my indexes file (I capitalized the first pair of indices to see if that made any difference...doesn't seem to):

cat Samples.txt
Pfl055 CGTCTTGA-ACGCAGTT Custom
PflHS034 ctacgagt-acgcagtt Custom
Pfl056 agagatgc-acgcagtt Custom
PflHS035 tacgccat-acgcagtt Custom

If I grep these sequences from the index files, there are plenty in there.  In fact, the miseq already autodemultiplexed everything OK, but I think I can do better with multx which is why I am pursuing this exercise (Undetermined reads are from a reanalysis where I took away all valid index sequences from the sample sheet).

If I run a command with -n, here is the output (seems like it is working?):

fastq-multx -n -l Samples.txt Undetermined_S0_L001_I1_001.fastq Undetermined_S0_L001_I2_001.fastq Undetermined_S0_L001_R1_001.fastq Undetermined_S0_L001_R2_001.fastq -o n/a -o n/a -o ./test_out/r1.%.fq -o ./test_out/r2.%.fq
Using Barcode Group: Custom on File: Undetermined_S0_L001_I1_001.fastq (start), Threshold 0.03%
Dual index on File: Undetermined_S0_L001_I2_001.fastq (start)
Pfl055 CGTCTTGA
PflHS034 ctacgagt
Pfl056 agagatgc
PflHS035 tacgccat

But then when I take away -n, it doesn't really work:

fastq-multx -l Samples.txt Undetermined_S0_L001_I1_001.fastq Undetermined_S0_L001_I2_001.fastq Undetermined_S0_L001_R1_001.fastq Undetermined_S0_L001_R2_001.fastq -o n/a -o n/a -o ./test_out/r1.%.fq -o ./test_out/r2.%.fq
Using Barcode Group: Custom on File: Undetermined_S0_L001_I1_001.fastq (start), Threshold 0.03%
Dual index on File: Undetermined_S0_L001_I2_001.fastq (start)
End used: start
Dual-end used: start
No barcodes defined, quitting.


So why don't my samples demultiplex?


alk...@nau.edu

unread,
Aug 20, 2014, 9:05:32 PM8/20/14
to ea-u...@googlegroups.com
Quick update:

1) I capitalized all the index sequences.  I think that might actually be necessary.  Confirmation?

2) If I specify -B instead of -l things seem to work, but I am only getting about 1/10th the reads that autodemultiplexed on the miseq (1000 per sample vs 10,000).  These indexes were designed to a minimum distance of 3, so I added -m 3 to get the 1000 reads I mentioned.  I grepped the flowcell address of the first demultiplexed sequence and get the following:

grep -A 1 "18527:6973" Undetermined_S0_L001_I1_001.fastq 
@M01315:84:000000000-AA888:1:1101:18527:6973 1:N:0:0
CGTACTCA

grep -A 1 "18527:6973" Undetermined_S0_L001_I2_001.fastq 
@M01315:84:000000000-AA888:1:1101:18527:6973 2:N:0:0
ACGCAGTT

These indexes should be CGTCTTGA-ACGCAGTT so only by allowing 3 errors was I able to get this read as the first index seems to have errors.  Just to be safe, I grepped the sequence against my list of real indexes and came up with nothing, so presumably these were errors during the index read (common for illumina).  I will double check with the user that I have the correct file on hand (long story, digital chaos right now) as it is possible I am actually demultiplexing an additional 1000 reads from those that didn't properly demultiplex from the instrument (exactly what I am trying to collect).

Aronesty, Erik

unread,
Aug 21, 2014, 9:45:19 AM8/21/14
to ea-u...@googlegroups.com

1) yes

 

2)

Can you post  an attachment (tar.gz) with

 

- a handful of reads from each file

- the adapter file

 

I can run some diagnostics to be sure, but typically when you run with -l, the system will autodetermine… meaning if you don't have sufficient # of matching reads, it will determine that it's looking at line-noise.    If you're trying to recover from a failed demux, using -B is the right way to go.   But you'll probably need to re-pool all the reads into 1 file.   This is why I never use Illumina's built-in demultiplexer…. no do-over's.

 

 

From: ea-u...@googlegroups.com [mailto:ea-u...@googlegroups.com] On Behalf Of alk...@nau.edu
Sent: Wednesday, August 20, 2014 9:06 PM
To: ea-u...@googlegroups.com
Subject: Re: fastq-multx

 

Quick update:

--

Reply all
Reply to author
Forward
0 new messages