How to determine “distance” parameters in aggregate_nearby_peaks

35 views
Skip to first unread message

chao lu

unread,
Feb 9, 2022, 3:51:43 AM2/9/22
to cicero-users
Hi  Hannah,
I am currently using cicero for monocle2 to analyze my single cell ATAC data. One of the section is the Single-cell accessibility trajectories, where the first step is aggregation to address sparsity issue. My question is how to set the appropriate parameter of "distance" in aggregate_nearby_peaks. I have noticed in your tutorial saying "Depending on the density of your data, you may want to try different distance parameters. In published work we have used 1000 and 10,000." I am not sure what it means of data density and how to evaluate the results of different distance parameters. I have tried to reset the "distance" parameter from 1000 to 10000 and obtained different downstream results. I need some suggestions from you on how to set the "distance" parameter for better downstream analysis. 
Another question is I find there are very limited advanced analysis for cicero based on monocle2, while monocle2 alone contains more affluent analysis sections, such as BEAM analysis. I am wondering whether it is viable to use these functions or analysis module from monocle2 to explore my single cell ATAC data.
Best,
Chao

hpl...@gmail.com

unread,
Mar 6, 2022, 6:10:21 AM3/6/22
to cicero-users
Hi Chao,

Apologies for the delay. The distance parameter is definitely a bit subjective - you want a large enough distance that you are smoothing the data somewhat to move out of the binary regime, but small enough that you are hopefully keeping somewhat relevant 'units' of accessibility. When making the choice, I've used a combination of what I know about the species (in fly for example, the genome is smaller and genes very close together, so a smaller aggregation made sense) and the quality of the data/depth of sequencing (sparser data needs a larger aggregation window). For evaluating the outcomes, I would try to take into account what you do now about the data (if its a timeseries for example or if you have some expectations about genes that should be present early or late in the trajectory). 

As for monocle2, I certainly recommend moving to monocle3  if possible. Downstream analyses like graph_test would replace BEAM in identifying patterns in different trajectories.

Hope this helps,
Hannah 
Reply all
Reply to author
Forward
0 new messages