fastq-multx for dualbarcode-indexed library

206 views
Skip to first unread message

lanc...@genomics.com.tw

unread,
Dec 10, 2014, 11:53:05 PM12/10/14
to ea-u...@googlegroups.com
Dear EA Utils, 

I was using the fastq-multx to demultiplex my dualbarocde-indexed library. The output finely demultiplexed into different files based on the barcode list. However, I discovered that only the barcode in the first input file, which happens to be my read1 fastq, were trimmed, but barcode in the second input file, which is my read2 fastq, remained. I was expecting both being trimmed. 

Was it a bug or am I setting the command wrong? 

# my cmd as below 
mkdir -p _test
fastq-multx -b NS14019_barcode_list.txt 1Mtest.R1.fastq 1Mtest.R2.fastq -o ./_test/%.R1.fastq -o ./_test/%.R2.fastq  -m 1 2>./_test/log &

# view my barcode file
cat NS14019_barcode_list.txt
1-16 ATTACTCG-TATAGCCT truseq
2-17 TCCGGAGA-ATAGAGGC truseq
3-18 CGCTCATT-CCTATCCT truseq
4-19 GAGATTCC-GGCTCTGA truseq
5-20 ATTACTCG-AGGCGAAG truseq
6-21 TCCGGAGA-TAATCTTA truseq
7-22 ATTCAGAA-TATAGCCT truseq
8-23 GAATTCGT-ATAGAGGC truseq
9-24 CTGAAGCT-CCTATCCT truseq
10-25 TAATGCGC-GGCTCTGA truseq
11-26 CGGCTATG-TATAGCCT truseq
12-27 TCCGCGAA-ATAGAGGC truseq
13-28 TCTCGCGC-CCTATCCT truseq
14-29 AGCGATAG-GGCTCTGA truseq
15-30 CGGCTATG-AGGCGAAG truseq

# 5' end
cat ./_test/1-16.R1.fastq  |  awk 'NR % 4 == 2' | head -10000 | perl -ne 'print $1."\n" if ($_ =~ /^(\w{8})\w+/)'| sort | uniq -c | sort -nrk1 | head 
   3291 CGAGATGG
   3287 CCCCCTCT
    121 CCCCCTCA
     14 CGAGTTGG
     12 CGAGATGA
      5 CGAGTGGC
      5 CGAATTGG
      5 CCCCTCTC
      4 CCCCCCTC
      4 CCCACTCT

 
cat ./_test/1-16.R2.fastq   |  awk 'NR % 4 == 2' | head -10000 | perl -ne 'print $1."\n" if ($_ =~ /^(\w{8})\w+/)'| sort | uniq -c | sort -nrk1 | head 
   6616 TATAGCCT
     41 AATAGCCT
     33 TAAAGCCT
     16 TCTAGCCT
     15 TGTAGCCT
     12 TATAGCAT
      9 TATAGCCA
      8 TTTAGCCT
      6 TATAGCCC
      6 TATAGACT

Thank you!

Erik Aronesty

unread,
Feb 24, 2015, 8:54:34 AM2/24/15
to ea-u...@googlegroups.com
Normally, with Illumina, the barcode is in a separate file.   I never tested with dual-barcode embedded in sequence.   This is, likely, a bug.

NGS部張天昫

unread,
Feb 25, 2015, 5:01:40 AM2/25/15
to ea-u...@googlegroups.com
Dear Erik, 

Thanks for your reply. 

Are you saying that this program is not suitable for demultiplexing dual-barcoded illumina pair-end data? 

Do you have any suggested program that is capable of this kind of work? 

Thanks again. 


--
You received this message because you are subscribed to a topic in the Google Groups "EA Utils" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ea-utils/O7aAlDiv9hQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ea-utils+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Lance Chang 張天昫 生物資訊分析專員
Genomics BioSci & Tech. Co., Ltd
Tel: +8862-2696-1658 # 102
Email: lanc...@genomics.com.tw
Address: 4F., No.100, Sec.1, Sintai 5th Road., Taipei County 221
地址: 新北市汐止區新台五路一段92號4樓

Erik Aronesty

unread,
Apr 8, 2015, 11:49:30 PM4/8/15
to ea-u...@googlegroups.com
we have done demux withe dual-barcoded paired end data a lot, just never embedded in the files.

you can run fastq-mcf to trim off the bases in the other file

some modification to the code would be needed to get it to work in one-pass for situations where the barcodes are embeded, but since this situation is very rare, it's never been worked on (i'd be happy to make that change if you really need it, but i'm not doing nextgen stuff full time right now).

On Wednesday, February 25, 2015 at 5:01:40 AM UTC-5, 張天昫 NGS部 wrote:
Dear Erik, 

Thanks for your reply. 

Are you saying that this program is not suitable for demultiplexing dual-barcoded illumina pair-end data? 

Do you have any suggested program that is capable of this kind of work? 

Thanks again. 

To unsubscribe from this group and all its topics, send an email to ea-utils+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Lance Chang 張天昫 生物資訊分析專員
Genomics BioSci & Tech. Co., Ltd
Tel: +8862-2696-1658 # 102
Eddress: 4F., No.100, Sec.1, Sintai 5th Road., Taipei County 221
地址: 新北市汐止區新台五路一段92號4樓
Reply all
Reply to author
Forward
0 new messages