How does Bartender handle UMI?

33 views
Skip to first unread message

赵路

unread,
Dec 6, 2016, 1:39:54 AM12/6/16
to Bartender

Unique molecular identifiers (UMIs) are additional, usually random, sequences that are added to template molecules before PCR that allow an investigator to detect and remove PCR duplicates and thereby improve the accuracy of amplicon counting. Bartender allows a user to attach a UMI sequence to each barcode prior to clustering: the user must simply generate a comma-separated file that contains one barcode and one UMI on each line. Following clustering, Bartender will search for identical UMIs within each cluster and report counts that include or exclude repeated UMIs (putative PCR duplicates). We note here that UMI length must be carefully considered as part of the experimental design, and provide more some general guidelines in the discussion. 


Because it searches only for exact UMI matches, a PCR or sequencing error that happened to occur in a repeated UMI would not be recognized as a repeat and thus result in over-counting of that cluster. However, the alternative, merging UMI with similar sequences, raises greater problems. Large clusters may contain many UMIs, and, because UMIs are generally short, UMI clustering would erroneously merge many distinct UMIs that are close in sequence. Even using our exact match criterion, it is possible that extremely large barcode clusters will begin to use up all available UMIs resulting in under- counting. For example, a barcode that is read 100,000 times and contains an 8mer UMI (48 = ~65,000 possible sequences) will necessarily have UMI repeats even when each sequenced read stems from a unique template molecule. To avoid these problems, we recommend selecting a UMI length that results in at least 10-fold more possible sequences than the largest expected cluster. 

Message has been deleted
Message has been deleted

876...@gmail.com

unread,
Apr 4, 2017, 1:13:25 AM4/4/17
to Bartender
I've found that you cannot simply input a .csv with 'barcode,umi' (eg. AACCTTGG,ACTG) for collapsing UMIs. I do not use exractor; do you had any advice on clustering with duplicate UMI removal when inputting raw sequence?


Sasha Levy

unread,
Apr 4, 2017, 11:08:33 AM4/4/17
to 876...@gmail.com, Bartender
Look for a file named “output_pcr_cluster.csv” for counts with UMIs removed. This should have appeared in your large run. For your small test run, it needs to detect at least one exactly duplicated barcode-UMI pair to output this file. Change the second line of your input file to “AAAAAAAAAA,TTTT” (exactly like the first line) and you should see the output_pcr_cluster.csv file being generated with the correct counts following UMI removal.



On Tue, Apr 4, 2017 at 1:13 AM <876...@gmail.com> wrote:
I've found that you cannot simply input a .csv with 'barcode,umi' (eg. AACCTTGG,ACTG) for collapsing UMIs. I do not use exractor; do you had any advice on clustering with duplicate UMI removal when inputting raw sequence?


--
You received this message because you are subscribed to the Google Groups "Bartender" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bartenderRandomBa...@googlegroups.com.
To post to this group, send email to bartenderRa...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bartenderRandomBarcode/a317c237-4a1b-4566-973e-c23ec745c93d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Sasha F Levy, PhD
Marsha Laufer Endowed Assistant Professor
Laufer Center for Physical and Quantitative Biology, Rm115B
Department of Biochemistry and Cell Biology
5252 Stony Brook University

赵路

unread,
Apr 6, 2017, 12:24:57 AM4/6/17
to Sasha Levy, 876...@gmail.com, Bartender
@Sasha, thanks for answering this question.

On Tue, Apr 4, 2017 at 8:08 AM, Sasha Levy <sasha...@gmail.com> wrote:
Look for a file named “output_pcr_cluster.csv” for counts with UMIs removed. This should have appeared in your large run. For your small test run, it needs to detect at least one exactly duplicated barcode-UMI pair to output this file. Change the second line of your input file to “AAAAAAAAAA,TTTT” (exactly like the first line) and you should see the output_pcr_cluster.csv file being generated with the correct counts following UMI removal.



On Tue, Apr 4, 2017 at 1:13 AM <876...@gmail.com> wrote:
I've found that you cannot simply input a .csv with 'barcode,umi' (eg. AACCTTGG,ACTG) for collapsing UMIs. I do not use exractor; do you had any advice on clustering with duplicate UMI removal when inputting raw sequence?


What do you mean by "with duplicate UMI removal"?  To be clear, Bartender removes duplicates in two steps, 1. It removes the duplicate umis for each unique sequence(putative barcode) in the very beginning. 2. It removes duplicate UMI within each established cluster in the end.  

--
You received this message because you are subscribed to the Google Groups "Bartender" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bartenderRandomBarcode+unsub...@googlegroups.com.
To post to this group, send email to bartenderRandomBarcode@googlegroups.com.
--
Sasha F Levy, PhD
Marsha Laufer Endowed Assistant Professor
Laufer Center for Physical and Quantitative Biology, Rm115B
Department of Biochemistry and Cell Biology
5252 Stony Brook University

--
You received this message because you are subscribed to the Google Groups "Bartender" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bartenderRandomBarcode+unsub...@googlegroups.com.
To post to this group, send email to bartenderRandomBarcode@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bartenderRandomBarcode/CAGs19nd9o0tBapzhKVr-9-aeSXfJ0HxbB_TQd%2B1LHTCb7H37cg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.



--
Sincerely,
 
Lu
Reply all
Reply to author
Forward
0 new messages