Clustering question

msma...@gmail.com

unread,

Jul 28, 2017, 1:03:59 PM7/28/17

to Bartender

Hi Lu,

Thanks for checking with regard to the reverse complements. I also have a question about clustering, which is confusing me in the following simple example. I created a barcode file with 1200 barcodes that are exactly the same ("AAAAAAAAAA") and one that is a single mismatch away ("GAAAAAAAAA"). I would expect that single mismatched barcode to be clustered in with the others, but it remains a separate cluster, even when I set -z -1, which I thought would make cluster merging decisions based on Hamming distance only. Do you know what is happening here?

Many thanks,

Michael

赵路

unread,

Jul 28, 2017, 9:22:04 PM7/28/17

to Michael Manhart, Bartender

Hi Michael,

It is expected. This two sequences will never be compared and merged because there is only on seed position (the first bp). Bartender first distributes sequences into different bins and merge sequences within each bin. These two sequences will be put in 2 different bins by this only seed bp. Literally AAAAAAAAAA will be put in the first bin and GAAAAAAAAA will be the in the third bin. And bartender finishes the clustering step after this as it already went through all seed positions. Bartender is not a general string clustering algorithm but is designed to solve barseq data with large barcode library and sufficient sequencing depth.

Hope it helps.

Best,

Lu

--
You received this message because you are subscribed to the Google Groups "Bartender" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bartenderRandomBarcode+unsub...@googlegroups.com.
To post to this group, send email to bartenderRandomBarcode@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bartenderRandomBarcode/03c3859b-9f98-4a7a-83cc-e97c5e913ee8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Sincerely,

Lu

msma...@gmail.com

unread,

Aug 3, 2017, 1:34:30 PM8/3/17

to Bartender, msma...@gmail.com

Okay, I see. I'm actually getting a runtime error with this example now, which I didn't have before, so perhaps something changed in a recent update? For example, if Test_barcode.txt just consists of

AAAAAAAAAA,1
GAAAAAAAAA,2

then when I run

bartender_single_com -f Test_barcode.txt -o Test -c 1

I get the output:

Running bartender
Loading barcodes from the file
It takes 00:00:00 to load the barcodes from Test3_barcode.txt
Start to clustering barcode with length 10
Using two sample unpooled test
transforming the barcodes into clusters
Initial number of unique reads: 2
The distance threshold is 2
Clustering iteration 1
terminate called after throwing an instance of 'std::runtime_error'
what(): The seed position is larger than the barcode length!

On Friday, July 28, 2017 at 9:22:04 PM UTC-4, 赵路 wrote:

Hi Michael,

It is expected. This two sequences will never be compared and merged because there is only on seed position (the first bp). Bartender first distributes sequences into different bins and merge sequences within each bin. These two sequences will be put in 2 different bins by this only seed bp. Literally AAAAAAAAAA will be put in the first bin and GAAAAAAAAA will be the in the third bin. And bartender finishes the clustering step after this as it already went through all seed positions. Bartender is not a general string clustering algorithm but is designed to solve barseq data with large barcode library and sufficient sequencing depth.

Hope it helps.
Best,
Lu

On Fri, Jul 28, 2017 at 10:03 AM, <msma...@gmail.com> wrote:

Hi Lu,

Thanks for checking with regard to the reverse complements. I also have a question about clustering, which is confusing me in the following simple example. I created a barcode file with 1200 barcodes that are exactly the same ("AAAAAAAAAA") and one that is a single mismatch away ("GAAAAAAAAA"). I would expect that single mismatched barcode to be clustered in with the others, but it remains a separate cluster, even when I set -z -1, which I thought would make cluster merging decisions based on Hamming distance only. Do you know what is happening here?

Many thanks,
Michael

--
You received this message because you are subscribed to the Google Groups "Bartender" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bartenderRandomBarcode+unsub...@googlegroups.com.

To post to this group, send email to bartenderRa...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/bartenderRandomBarcode/03c3859b-9f98-4a7a-83cc-e97c5e913ee8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Sincerely,

Lu

msma...@gmail.com

unread,

Aug 3, 2017, 1:37:57 PM8/3/17

to Bartender, msma...@gmail.com

Specifically, if the two barcodes are between 1 and 3 mismatches from each other, I'm getting this error. If the two barcodes are exactly the same or greater than 3 mismatches, then the error doesn't occur.

Reply all

Reply to author

Forward