Unique molecular identifiers (UMIs) are additional, usually random, sequences that are added to template molecules before PCR that allow an investigator to detect and remove PCR duplicates and thereby improve the accuracy of amplicon counting. Bartender allows a user to attach a UMI sequence to each barcode prior to clustering: the user must simply generate a comma-separated file that contains one barcode and one UMI on each line. Following clustering, Bartender will search for identical UMIs within each cluster and report counts that include or exclude repeated UMIs (putative PCR duplicates). We note here that UMI length must be carefully considered as part of the experimental design, and provide more some general guidelines in the discussion.
Because it searches only for exact UMI matches, a PCR or sequencing error that happened to occur in a repeated UMI would not be recognized as a repeat and thus result in over-counting of that cluster. However, the alternative, merging UMI with similar sequences, raises greater problems. Large clusters may contain many UMIs, and, because UMIs are generally short, UMI clustering would erroneously merge many distinct UMIs that are close in sequence. Even using our exact match criterion, it is possible that extremely large barcode clusters will begin to use up all available UMIs resulting in under- counting. For example, a barcode that is read 100,000 times and contains an 8mer UMI (48 = ~65,000 possible sequences) will necessarily have UMI repeats even when each sequenced read stems from a unique template molecule. To avoid these problems, we recommend selecting a UMI length that results in at least 10-fold more possible sequences than the largest expected cluster.
I've found that you cannot simply input a .csv with 'barcode,umi' (eg. AACCTTGG,ACTG) for collapsing UMIs. I do not use exractor; do you had any advice on clustering with duplicate UMI removal when inputting raw sequence?
--
You received this message because you are subscribed to the Google Groups "Bartender" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bartenderRandomBa...@googlegroups.com.
To post to this group, send email to bartenderRa...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bartenderRandomBarcode/a317c237-4a1b-4566-973e-c23ec745c93d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Look for a file named “output_pcr_cluster.csv” for counts with UMIs removed. This should have appeared in your large run. For your small test run, it needs to detect at least one exactly duplicated barcode-UMI pair to output this file. Change the second line of your input file to “AAAAAAAAAA,TTTT” (exactly like the first line) and you should see the output_pcr_cluster.csv file being generated with the correct counts following UMI removal.On Tue, Apr 4, 2017 at 1:13 AM <876...@gmail.com> wrote:I've found that you cannot simply input a .csv with 'barcode,umi' (eg. AACCTTGG,ACTG) for collapsing UMIs. I do not use exractor; do you had any advice on clustering with duplicate UMI removal when inputting raw sequence?
--
You received this message because you are subscribed to the Google Groups "Bartender" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bartenderRandomBarcode+unsub...@googlegroups.com.
To post to this group, send email to bartenderRandomBarcode@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bartenderRandomBarcode/a317c237-4a1b-4566-973e-c23ec745c93d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--Sasha F Levy, PhDMarsha Laufer Endowed Assistant ProfessorLaufer Center for Physical and Quantitative Biology, Rm115BDepartment of Biochemistry and Cell Biology5252 Stony Brook University
--
You received this message because you are subscribed to the Google Groups "Bartender" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bartenderRandomBarcode+unsub...@googlegroups.com.
To post to this group, send email to bartenderRandomBarcode@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bartenderRandomBarcode/CAGs19nd9o0tBapzhKVr-9-aeSXfJ0HxbB_TQd%2B1LHTCb7H37cg%40mail.gmail.com.