Relevance of frequency

31 views

Skip to first unread message

Tomas Grudny

unread,

Sep 8, 2022, 10:17:34 PM9/8/22

to Bartender

Hello,

I am extracting barcodes from my reads independently of bartender. I then proceed to create a csv where the first column contains only unique barcode sequences and the second contains their frequency (number of reads containing each one). Finally, I run this list through bartender_single_com. This seemed to work fine, but downstream analyses are suggesting something went wrong with this clustering step.

As such, I have two questions regarding the csv file that goes into bartender_single_com:

1) Does frequency of each barcode matter to how bartender infers clusters? If so, is the best path to leave the raw list of barcodes as they were extracted from the reads (as opposed to creating a frequency table with each row representing a unique barcode)?

2) Does the content of the second column actually matter for the basic use of bartender_single_com? Or would including something like the row index be fine?

Thanks a lot for your help.

Best,

Tomas

赵路

unread,

Sep 8, 2022, 11:39:20 PM9/8/22

to Bartender

On Thursday, September 8, 2022 at 10:17:34 PM UTC-4 tgr...@ethz.ch wrote:

Hello,

I am extracting barcodes from my reads independently of bartender. I then proceed to create a csv where the first column contains only unique barcode sequences and the second contains their frequency (number of reads containing each one). Finally, I run this list through bartender_single_com. This seemed to work fine, but downstream analyses are suggesting something went wrong with this clustering step.

As such, I have two questions regarding the csv file that goes into bartender_single_com:

1) Does frequency of each barcode matter to how bartender infers clusters? If so, is the best path to leave the raw list of barcodes as they were extracted from the reads (as opposed to creating a frequency table with each row representing a unique barcode)?

Yes. The second column of the input file for the clustering algorithm is suppose to be the UMI for the extracted barcode. The clustering algorithm count unique barcodes in the initialization step.

2) Does the content of the second column actually matter for the basic use of bartender_single_com? Or would including something like the row index be fine?