Fwd: bartender algorithm

33 views

Skip to first unread message

赵路

unread,

Dec 6, 2016, 1:30:58 AM12/6/16

to bartenderRa...@googlegroups.com

---------- Forwarded message ----------
From: 赵路 <luzha...@gmail.com>
Date: Mon, Dec 5, 2016 at 10:29 PM
Subject: Re: bartender algorithm
To: Taylor Mighell <mig...@ohsu.edu>
Cc: bartenderRa...@gmail.com

Hi Taylor,

Did you include the UMI in the input file? Bartender handles UMI in two levels.

1. All same barcode with same UMI will be viewed as duplicate. That says all these duplicates count as one when measuring cluster size.

2. All unique reads that belong to same cluster and have same UMI will be viewed as duplicate too. That is they are more likely come from sequence error and PCR. They counts as one.

This strategy is definitely not perfect. It favors the low-frequency barcode when measuring cluster size. But it gives more accurate size estimate for high-frequency barcode. Please let me know if you have better idea to handle UMI.

You can run the bartender by assign each reads with unique UMI (for example number them). You can compare these two results and see the difference.

Hopefully I explain it clearly.

Thank you.

Best,

On Mon, Dec 5, 2016 at 10:20 AM, Taylor Mighell <mig...@ohsu.edu> wrote:

Lu, thanks for the help. I do have a further question. When I make clusters from my reads, the largest cluster has frequency of ~40. However, I know that some barcodes in the dataset have frequency over 100. Any idea why this might be?
Thank you,
Taylor

From: 赵路 [luzha...@gmail.com]
Sent: Friday, December 02, 2016 10:25 PM

To: Taylor Mighell
Subject: Re: bartender algorithm

Hi Taylor,

I should thank you. I did some investigation on the code after I receive your email. I found that my last change on the software was wrong. I corrected that mistake and the software should work fine now. I don't know how many people used and found it not work before that. Probably it's very hard to get those people back. I wish I could receive your email earlier.

Please let me know how you think about Bartender in terms of accuracy and speed. You might need to tune the parameter such that the result looks more reasonable.

Thank you again.

Best,

Lu

On Thu, Dec 1, 2016 at 2:45 PM, Taylor Mighell <mig...@ohsu.edu> wrote:

Hey Lu, I did those things and it looks like it works now!
Attached is the log, just so you have it. I'm curious how you modified the code?
Thanks so much for helping me with this.
Taylor

From: 赵路 [luzha...@gmail.com]
Sent: Thursday, December 01, 2016 2:28 PM
To: Taylor Mighell
Subject: Re: bartender algorithm

Hi Taylor,

Could you do me a favor by doing the following step?

1. Download the latest version of bartender-1.1 from the github

2. Build it and install it

3. Run the new installed version against your data again

4. Send me the log to me via email.

Thank you!

Best,

Lu

On Thu, Dec 1, 2016 at 11:15 AM, Taylor Mighell <mig...@ohsu.edu> wrote:

Hi Lu, thanks a lot for the help.
I ran bartender on your new example data, and it looks like it worked fine.

However, when I run the program on my own data, it still won't return any barcodes. See, for example, this line of the log:

Start to clustering barcode with length 16
Using two sample unpooled test
transforming the barcodes into clusters
Initial number of unique reads: 729658
The distance threshold is 3
Identified 0 barcodes with length 16

I know that some of the barcodes are represented by >100 reads (by using sort and uniq -c on the barcode reads). Do you think I need to adjust some parameters? I guess I'm just not understanding why a barcode with >100 reads wouldn't represent a cluster.

I have included the file I am working with as well as the commands/feedback I got from the command line.

Again, thanks a lot for your help.
Taylor

From: 赵路 [luzha...@gmail.com]
Sent: Thursday, December 01, 2016 1:21 AM

To: Taylor Mighell
Cc: Song Wu; Sasha Levy; Zhimin Liu
Subject: Re: bartender algorithm

Forget to mention, bartender-1.1 should have the latest feature.

On Thu, Dec 1, 2016 at 1:20 AM, 赵路 <luzha...@gmail.com> wrote:

Hi Taylor,

I see. That's just an example with random data, which is used to show how to run the Bartender and what the result looks like. The result itself is meaningless and I did not expect users to consider the result's semantic meaning.

I apologize for this confusion. That's a bad example and I should replace it with a dataset that has meaningful clustering result.

I included an simulation data in the example folder, which you can test the program.

Let me know if you have any problem or questions.

Thanks

Best,

Lu

On Wed, Nov 30, 2016 at 11:41 PM, Taylor Mighell <mig...@ohsu.edu> wrote:

Hi Lu and others,
The library that I analyzed in the example is the example data distributed with the bartender software on github (the 2M_barcode_barcode.txt file).
Thank you,
Taylor

From: 赵路 [luzha...@gmail.com]
Sent: Wednesday, November 30, 2016 10:21 PM
To: Taylor Mighell
Cc: Song Wu; Sasha Levy; Zhimin Liu
Subject: Re: bartender algorithm

Hi Taylor,

It's great to hear that you're using Bartender. From the log, it looks like you have very small barcode library, which only has 242 unique reads across different barcode length. Bartender might not be suitable for such kind of sample. Bartender is designed for large scale barcode library with large sequence depth.

If you could give me the input data or a small portion of it, I might have some insights on why bartender did not group any unique reads.

****People I included in this email are also actively working on Bartender.

Thanks

Best,

Lu

On Wed, Nov 30, 2016 at 3:54 PM, Taylor Mighell <mig...@ohsu.edu> wrote:

Hi Lu,
I am a graduate student and am trying to use your bartender algorithm!
I have skipped the extractor step, since I already have a good pipeline for trimming adapters etc. However, when I input a csv with barcode and identifier in two columns into the clustering algorithm, it doesn't actually cluster anything. To clarify, it ouputs a file that considers each barcode its own cluster. Additionally, when I use your example data, I get a similar output (I have attached this).
Any idea what might be going on?
Any help would be very useful.
Thanks!
Taylor

--

Sincerely,

Lu

--

Sincerely,

Lu

--

Sincerely,

Lu

--

Sincerely,

Lu

--

Sincerely,

Lu