Okay, I see. I'm actually getting a runtime error with this example now,
which I didn't have before, so perhaps something changed in a recent
update? For example, if Test_barcode.txt just consists of
AAAAAAAAAA,1
GAAAAAAAAA,2
then when I run
bartender_single_com -f Test_barcode.txt -o Test -c 1
I get the output:
Running bartender
Loading barcodes from the file
It takes 00:00:00 to load the barcodes from Test3_barcode.txt
Start to clustering barcode with length 10
Using two sample unpooled test
transforming the barcodes into clusters
Initial number of unique reads: 2
The distance threshold is 2
Clustering iteration 1
terminate called after throwing an instance of 'std::runtime_error'
what(): The seed position is larger than the barcode length!
On Friday, July 28, 2017 at 9:22:04 PM UTC-4, 赵路 wrote:
Hi Michael,
It is expected. This two sequences will never be compared and merged because there is only on seed position (the first bp). Bartender first distributes sequences into different bins and merge sequences within each bin. These two sequences will be put in 2 different bins by this only seed bp. Literally AAAAAAAAAA will be put in the first bin and GAAAAAAAAA will be the in the third bin. And bartender finishes the clustering step after this as it already went through all seed positions. Bartender is not a general string clustering algorithm but is designed to solve barseq data with large barcode library and sufficient sequencing depth.
Hope it helps.
Best,
Lu
Hi Lu,
Thanks for checking with regard to the reverse complements. I also have a question about clustering, which is confusing me in
the following simple example. I created a barcode file with 1200
barcodes that are exactly the same ("AAAAAAAAAA") and one that is a
single mismatch away ("GAAAAAAAAA"). I would expect that single
mismatched barcode to be clustered in with the others, but it remains a
separate cluster, even when I set -z -1, which I thought would make
cluster merging decisions based on Hamming distance only. Do you know
what is happening here?
Many thanks,
Michael