fastq-multx

Orla O'Sullivan

unread,

May 30, 2014, 7:27:52 AM5/30/14

to ea-u...@googlegroups.com

Hi

I am wondering if fastq-multx can be used for dual indexing.

I have two index files and two read files.

I have ran the forward and reverse reads seperately but is there a way to do it simutaneously. so I get 1 output file per index pair.

Thanks

Orla

Aronesty, Erik

unread,

May 30, 2014, 10:22:54 AM5/30/14

to ea-u...@googlegroups.com

Yes, we use it for dual-indexing all the time. You can use -l , -L or -B modes for dual indexes, -g will not work.

EXAMPLE (r1 is "read 1" from illumina):

fastq-multx -l dual_barcodes.txt r2.gz r3.gz r1.gz r4.gz -o n/a -o n/a -o r1.%.fastq.gz -o r2.gz

EXAMPLE of dual index file:

SampleID Index Style

D701_501 ATTACTCG-TATAGCCT Nextera

D701_502 ATTACTCG-ATAGAGGC Nextera

D701_503 ATTACTCG-CCTATCCT Nextera

D701_504 ATTACTCG-GGCTCTGA Nextera

D701_505 ATTACTCG-AGGCGAAG Nextera

D701_506 ATTACTCG-TAATCTTA Nextera

D701_507 ATTACTCG-CAGGACGT Nextera

D701_508 ATTACTCG-GTACTGAC Nextera

D702_501 TCCGGAGA-TATAGCCT Nextera

--
You received this message because you are subscribed to the Google Groups "EA Utils" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ea-utils+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Orla O'Sullivan

unread,

May 30, 2014, 10:53:49 AM5/30/14

to ea-u...@googlegroups.com

Thanks Eric I had seen that. The problem is that the indexs are already removed from the Read files so have no way of recognising them through the barcodes

Orla

Aronesty, Erik

unread,

May 30, 2014, 11:19:50 AM5/30/14

to ea-u...@googlegroups.com

I'm not sure what you mean by "removed from the read files". Can you paste in an example (1 row from each file you have).

Orla O'Sullivan

unread,

May 30, 2014, 11:34:26 AM5/30/14

to ea-u...@googlegroups.com

Thank you for being so helpful. AS you can see the index "CCCTCTTT" is no longer at the start of the read file.

@M01385:25:000000000-A6GWW:1:1101:16207:1221 1:N:0:0
CCCTCTTT
+
6,,6,,,,
@M01385:25:000000000-A6GWW:1:1101:16207:1221 1:N:0:0
CCTCCTTTTTTCTTCCTTTCTTCCTCTTTCTCCCTTTTTTCTCTTCTTCTTCCTCTCTTTCTCTTTCTTTCTTCCTTCTTTCTTTTTTTTCCTTTTTTTTTTTTTTTTCTTTTCTTTCTTTTCTTTTCTCTTTTTTCCTTTCTCTTTTTTTTCCTTCTTTTTTTTTTCTTTTCCTTTTTTTTCTTTTTT
CCTTTTTTTCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+
68A@BEDD9CFF6CECFCE,CF<CC6CEC,6<CCB;B,,@,,<66CCE<6,,;CC,;,C6<CEDDE,6@E;5A5<CD,66E:CECCDEEC96,,99<=C@C:4+4>C:4;6,,,7?4,5,754,,5=,33;==;@+,6,3=,+,3,33,+3:322;0**33*4*11:2+4,5+08;;9:))//<@98*7
))57)3;17,)39<68C6..).772,23(777<<<>9(,(-2211(24):38>88888821(:8888885(-(,-316775775755))/-17755.888888655555888

Aronesty, Erik

unread,

May 30, 2014, 11:51:56 AM5/30/14

to ea-u...@googlegroups.com

That is only 2 files. When you say dual-indexed, do you mean Nextera-like? If not, then I said the wrong thing.

Orla O'Sullivan

unread,

May 30, 2014, 11:55:49 AM5/30/14

to ea-u...@googlegroups.com

Sorry the yes the Nextera dual indexing

the other two files are

@M01385:25:000000000-A6GWW:1:1101:16207:1221 2:N:0:0
CTTTCCCT
+
-,66,5,6

@M01385:25:000000000-A6GWW:1:1101:16207:1221 2:N:0:0
TCCTCCTTTTTTCTCTCCTCCTTTTTTCTCCCCTCTCTTTCTTTCCTCCTTTTCCTTTTTTTTCTCTTCTTCTTCCTTCTCCCTTTTTTTTCTTCCTTCTCTCTCCTCCTTTCCCCTCTCCCCTTTTTCTTCCTCTTTCCTCTTCTTTCCTCTCTTTCTCCATTTTTCTCTTCCTTTTTTTTTTTTTTC
CCCTTTTTTTCCCTTTTTTTTTTTTTTTTTTTCTTCTCTCCCTTTTCTCCCTTTTCTTCTTTTTTTTTTTTTTTTTTTTCTTTTTCGCTGTTCTTCTTTTCTTTTTTTTTTT
+
8,886;6,=CC@6C,50;5;4;=AE@@=A5<,,,86AABCCC,;@,;6;4,<?0*;;@6,+64,4396,,9,99<>>BDB=,493,65>>C86=4,4,+949==5)613:D;0*,,98153,+0*05+03;;9=9992++336+*21,*1;:95:+**++3066=9+*+*3;0*36:3*1)0:5;:*1,
.*)-0)*052(*.*,,22**481*/*//**,(57**.-4)((.-3))))).(),))--)))0+0***,(,-((,**((((.-).4(((((,))).)--.)))-4),23(,((

--
You received this message because you are subscribed to a topic in the Google Groups "EA Utils" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ea-utils/Oc7fdmLs_DY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ea-utils+u...@googlegroups.com.

Aronesty, Erik

unread,

May 30, 2014, 2:07:48 PM5/30/14

to ea-u...@googlegroups.com

Unless the barcodes are interleaved somehow (not sure from how you pasted it) the command line below should work fine.

fastq-multx does not require the barcode to be "ligated" to the read.

R2 is

fastq-multx -l dual_barcodes.txt r2.gz r3.gz r1.gz r4.gz -o n/a -o n/a -o r1.%.fastq.gz -o r2.gz

FILE R2

@M01385:25:000000000-A6GWW:1:1101:16207:1221 1:N:0:0
CCCTCTTT
+
6,,6,,,,

FILE R3

@M01385:25:000000000-A6GWW:1:1101:16207:1221 2:N:0:0
CTTTCCCT
+
-,66,5,6

FILE R1:

@M01385:25:000000000-A6GWW:1:1101:16207:1221 1:N:0:0
CCTCCTTTTTTCTTCCTTTCTTCCTCTTTCTCCCTTTTTTCTCTTCTTCTTCCTCTCTTTCTCTTTCTTTCTTCCTTCTTTCTTTTTTTTCCTTTTTTTTTTTTTTTTCTTTTCTTTCTTTTCTTTTCTCTTTTTTCCTTTCTCTTTTTTTTCCTTCTTTTTTTTTTCTTTTCCTTTTTTTTCTTTTTT
CCTTTTTTTCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+
68A@BEDD9CFF6CECFCE,CF<CC6CEC,6<CCB;B,,@,,<66CCE<6,,;CC,;,C6<CEDDE,6@E;5A5<CD,66E:CECCDEEC96,,99<=C@C:4+4>C:4;6,,,7?4,5,754,,5=,33;==;@+,6,3=,+,3,33,+3:322;0**33*4*11:2+4,5+08;;9:))//<@98*7
))57)3;17,)39<68C6..).772,23(777<<<>9(,(-2211(24):38>88888821(:8888885(-(,-316775775755))/-17755.888888655555888

Orla O'Sullivan

unread,

Jun 3, 2014, 4:30:33 AM6/3/14

to ea-u...@googlegroups.com

Apologies for the late reply we had a public holiday here in Ireland. Thanks for the advice.. I will try that now

alk...@nau.edu

unread,

Aug 20, 2014, 8:24:09 PM8/20/14

to ea-u...@googlegroups.com

I think I am doing everything right, yet I keep not getting things to work. I have dual indexed reads from a miseq (4 reads in this order: read1, index1, index2, read2). Here are the first 4 lines from index1 (normal fastq output from miseq, phred33 coding):

@M01315:84:000000000-AA888:1:1101:17152:1632 1:N:0:0

GTTAAATC

+

>11>133B

Here is the contents of my indexes file (I capitalized the first pair of indices to see if that made any difference...doesn't seem to):

cat Samples.txt

Pfl055 CGTCTTGA-ACGCAGTT Custom

PflHS034 ctacgagt-acgcagtt Custom

Pfl056 agagatgc-acgcagtt Custom

PflHS035 tacgccat-acgcagtt Custom

If I grep these sequences from the index files, there are plenty in there. In fact, the miseq already autodemultiplexed everything OK, but I think I can do better with multx which is why I am pursuing this exercise (Undetermined reads are from a reanalysis where I took away all valid index sequences from the sample sheet).

If I run a command with -n, here is the output (seems like it is working?):

fastq-multx -n -l Samples.txt Undetermined_S0_L001_I1_001.fastq Undetermined_S0_L001_I2_001.fastq Undetermined_S0_L001_R1_001.fastq Undetermined_S0_L001_R2_001.fastq -o n/a -o n/a -o ./test_out/r1.%.fq -o ./test_out/r2.%.fq

Using Barcode Group: Custom on File: Undetermined_S0_L001_I1_001.fastq (start), Threshold 0.03%

Dual index on File: Undetermined_S0_L001_I2_001.fastq (start)

Pfl055 CGTCTTGA

PflHS034 ctacgagt

Pfl056 agagatgc

PflHS035 tacgccat

But then when I take away -n, it doesn't really work:

fastq-multx -l Samples.txt Undetermined_S0_L001_I1_001.fastq Undetermined_S0_L001_I2_001.fastq Undetermined_S0_L001_R1_001.fastq Undetermined_S0_L001_R2_001.fastq -o n/a -o n/a -o ./test_out/r1.%.fq -o ./test_out/r2.%.fq

Using Barcode Group: Custom on File: Undetermined_S0_L001_I1_001.fastq (start), Threshold 0.03%

Dual index on File: Undetermined_S0_L001_I2_001.fastq (start)

End used: start

Dual-end used: start

No barcodes defined, quitting.

So why don't my samples demultiplex?

alk...@nau.edu

unread,

Aug 20, 2014, 9:05:32 PM8/20/14

to ea-u...@googlegroups.com

Quick update:

1) I capitalized all the index sequences. I think that might actually be necessary. Confirmation?

2) If I specify -B instead of -l things seem to work, but I am only getting about 1/10th the reads that autodemultiplexed on the miseq (1000 per sample vs 10,000). These indexes were designed to a minimum distance of 3, so I added -m 3 to get the 1000 reads I mentioned. I grepped the flowcell address of the first demultiplexed sequence and get the following:

grep -A 1 "18527:6973" Undetermined_S0_L001_I1_001.fastq

@M01315:84:000000000-AA888:1:1101:18527:6973 1:N:0:0

CGTACTCA

grep -A 1 "18527:6973" Undetermined_S0_L001_I2_001.fastq

@M01315:84:000000000-AA888:1:1101:18527:6973 2:N:0:0

ACGCAGTT

These indexes should be CGTCTTGA-ACGCAGTT so only by allowing 3 errors was I able to get this read as the first index seems to have errors. Just to be safe, I grepped the sequence against my list of real indexes and came up with nothing, so presumably these were errors during the index read (common for illumina). I will double check with the user that I have the correct file on hand (long story, digital chaos right now) as it is possible I am actually demultiplexing an additional 1000 reads from those that didn't properly demultiplex from the instrument (exactly what I am trying to collect).

Aronesty, Erik

unread,

Aug 21, 2014, 9:45:19 AM8/21/14

to ea-u...@googlegroups.com

1) yes

2)

Can you post an attachment (tar.gz) with

- a handful of reads from each file

- the adapter file

I can run some diagnostics to be sure, but typically when you run with -l, the system will autodetermine… meaning if you don't have sufficient # of matching reads, it will determine that it's looking at line-noise. If you're trying to recover from a failed demux, using -B is the right way to go. But you'll probably need to re-pool all the reads into 1 file. This is why I never use Illumina's built-in demultiplexer…. no do-over's.

From: ea-u...@googlegroups.com [mailto:ea-u...@googlegroups.com] On Behalf Of alk...@nau.edu
Sent: Wednesday, August 20, 2014 9:06 PM
To: ea-u...@googlegroups.com
Subject: Re: fastq-multx