Hi Lu,
Thanks for the information. I can share two data files with you over Dropbox if that would be convenient.
Here are my commands and the stderr/stdout from running these data sets:
$ bartender_extractor_com -f Run3/Read_data/13_0_S1_L001_R1_001.fastq -o sample13 -p GCCGG[14-16]TATCT
Running bartender extractor
/home/mmanhart/Tools/bartender-1.1-master/bartender_extractor Run3/Read_data/13_0_S1_L001_R1_001.fastq sample13 1 "(GCCG.|GCC.G|
GC.GG|G.CGG|.CCGG)([ATCGN]{14,16})(TATC.|TAT.T|TA.CT|T.TCT|.ATCT)" GCCGG TATCT 3
Totally there are 4092380 reads in Run3/Read_data/13_0_S1_L001_R1_001.fastq file!
Totally there are 3933803 valid barcodes from Run3/Read_data/13_0_S1_L001_R1_001.fastq file
Totally there are 3933803 valid barcodes whose quality pass the quality condition
The estimated sequence error from the prefix and suffix parts is 0.0102028
00:00:09
$ bartender_extractor_com -f Run3/Read_data/14_0_S2_L001_R1_001.fastq -o sample14 -p GCCGG[14-16]TATCT
Running bartender extractor
/home/mmanhart/Tools/bartender-1.1-master/bartender_extractor Run3/Read_data/14_0_S2_L001_R1_001.fastq sample14 1 "(GCCG.|GCC.G|
GC.GG|G.CGG|.CCGG)([ATCGN]{14,16})(TATC.|TAT.T|TA.CT|T.TCT|.ATCT)" GCCGG TATCT 3
Totally there are 3763961 reads in Run3/Read_data/14_0_S2_L001_R1_001.fastq file!
Totally there are 3650191 valid barcodes from Run3/Read_data/14_0_S2_L001_R1_001.fastq file
Totally there are 3650191 valid barcodes whose quality pass the quality condition
The estimated sequence error from the prefix and suffix parts is 0.00844517
00:00:08
$ bartender_single_com -f sample13_barcode.txt -o sample13 -c 1
Running bartender
Loading barcodes from the file
It takes 00:00:02 to load the barcodes from sample13_barcode.txt
Start to clustering barcode with length 14
Using two sample unpooled test
transforming the barcodes into clusters
Initial number of unique reads: 7469
The distance threshold is 2
Clustering iteration 1
Clustering iteration 2
Identified 4584 barcodes with length 14
Start to clustering barcode with length 15
Using two sample unpooled test
transforming the barcodes into clusters
Initial number of unique reads: 967178
The distance threshold is 2
Clustering iteration 1
Clustering iteration 2
Clustering iteration 3
Identified 376906 barcodes with length 15
Start to clustering barcode with length 16
Using two sample unpooled test
transforming the barcodes into clusters
Initial number of unique reads: 6858
The distance threshold is 2
Clustering iteration 1
Clustering iteration 2
Clustering iteration 3
Identified 4114 barcodes with length 16
The clustering process takes 00:00:17
start to dump clusters to file with prefix sample13
There is no pcr effects in the original data
The estimated error rate is 0.0135624
The overall running time 00:00:29 seconds.
$ bartender_single_com -f sample14_barcode.txt -o sample14 -c 1
Running bartender
Loading barcodes from the file
It takes 00:00:02 to load the barcodes from sample14_barcode.txt
Start to clustering barcode with length 14
Using two sample unpooled test
transforming the barcodes into clusters
Initial number of unique reads: 6593
The distance threshold is 2
Clustering iteration 1
Clustering iteration 2
Identified 3974 barcodes with length 14
Start to clustering barcode with length 15
Using two sample unpooled test
transforming the barcodes into clusters
Initial number of unique reads: 804699
The distance threshold is 2
Clustering iteration 1
Clustering iteration 2
Clustering iteration 3
Identified 324366 barcodes with length 15
Start to clustering barcode with length 16
Using two sample unpooled test
transforming the barcodes into clusters
Initial number of unique reads: 6033
The distance threshold is 2
Clustering iteration 1
Clustering iteration 2
Clustering iteration 3
Identified 3349 barcodes with length 16
The clustering process takes 00:00:12
start to dump clusters to file with prefix sample14
There is no pcr effects in the original data
The estimated error rate is 0.0127857
The overall running time 00:00:23 seconds.
$ bartender_combiner_com -f sample13_cluster.csv,sample13_quality.csv,sample14_cluster.csv,sample14_quality.csv -o samples13_14 -c 1
Running bartender_combiner
Current generation 1
Finished merging generation 1
Current generation 0
The last command for the combiner just gets stuck there and never seems to finish. I don't have this problem, however, on some small test data sets I tried, just this real data. I would really appreciate it if you can figure out what's wrong!