Cell aggregation for co-accessibility score calculation

56 views
Skip to first unread message

Isabelle Lander

unread,
Aug 12, 2021, 9:20:32 AM8/12/21
to cicero-users
Hi Hannah,

I have a question regarding your cell aggregation approach prior to co-accessibility score calculation. I understand your reasoning why it is necessary due to sparsity of the data. However, I am a bit worried with the "duplication" of cells in different aggregates/groups.

In your publication you say: "Note that with these parameter settings in a typical experiment, a cell will be part of more than one group and therefore the groups will sometimes contain some of the same cells, which could in principle inflate co-accessibility scores across cells. However, in practice in our analyses of both GM12878 and HSMM, the median number of cells shared between pairs of groups is zero."

What did you do to keep the median number of shared cells to zero? The 90 % cutoff for overlaps with existing groups/aggregates is not sufficient I guess. Did you simply reduce the number of groups/aggregates sampled? And if yes, what would be the minimal number of aggregates you would recommend using?
Additionally, why did you choose aggregation of 50 cells per group/aggregate initially?

In other methods, people aggregate cells disregarding the "duplication rate" of cells in different aggregates (same cells in up to 10-20 aggregates). What is your take on that? Wouldn't correlation coefficients on these cell aggregates be highly inflated?

Thank you very much in advance for your answer.

Best,
Isabelle

hpl...@gmail.com

unread,
Aug 24, 2021, 1:15:48 PM8/24/21
to cicero-users
Hi Isabelle,

We found in practice that the median number of shared cells was zero even using the default cutoff of 90% (it's a printout during the run if you'd like to see the number for your data). This seems to happen because cells are clumpy and so most of the choices of anchor cells will either have very similar nearest neighbors (and be discarded) or very different nearest neighbors from other aggregates (in another clump). 

As for the 50, as I recall, we tried a few values and 50 seemed to thread the needle between being low enough to have enough aggregates and high enough to move sufficiently far from binary values.

On your last question, my intuition is that too much overlap would inflate the correlations, but perhaps in a sufficiently diverse and clumpy population you might still have enough diversity to be able to compare relative values... it might take some testing to find out.

Best,
Hannah

Reply all
Reply to author
Forward
0 new messages