Filtering out lowly connected peaks from CCANs

Daniel Gingerich

unread,

Sep 7, 2021, 2:46:32 PM9/7/21

to cicero-users

Hello,

Another question on integrating WGCNA techniques for use with cicero:

I was wondering what your thoughts are on using peak connectivity (kME) within each CCAN to filter out weak connections. I notice that many of my CCANs are very large, with >50 peaks. In the context of chromatin hubs, this seems a bit too large. So what I am doing is the same as WGCNA - filtering out peaks with low kME (connectivity, i.e. k module eigengene). I run SVD on each CCAN by subsetting from the cicero object count matrix. Then, I look at the correlation of each peak to the first principal component from the SVD, which I have termed 'eigenpeak', in reference to WGCNA. After this, poorly connected peaks can be filtered out using the kME value of each peak. Using a threshold of kME > 0.5, this greatly reduces the size of my CCANs -This results in about 3-4 peaks for most CCANs, which is in closer alignment with previous research on chromatin hubs.

Best,

Dan

hpl...@gmail.com

unread,

Sep 21, 2021, 1:58:54 PM9/21/21

to cicero-users

Hi Dan,

This seems like a reasonable strategy, but I would recommend finding an experimental dataset to compare with... maybe some ChIA-PET? Otherwise it may be hard to justify a threshold...(I think the jury is still out on how big chromatin hubs should be). I have played in the past with raising and lowering co-access score thresholds to find a more core set of peaks in a CCAN, but that method certainly leaves a lot to be desired. Also considered at the time other methods of community detection, but as Louvain worked pretty well we left it at that. I'd be curious to know what you find out!

Best,

Hannah

Daniel Gingerich

unread,

Oct 4, 2021, 12:11:12 PM10/4/21

to cicero-users

Thanks Hannah! ChIA-PET is really exciting data. I found one paper about ChIA-drop which can reveal the entire DNA-protein complex at single molecule resolution by isolating each complex into GEMs and ligating barcodes to each fragment (https://www.nature.com/articles/s41586-019-0949-1). I am having trouble finding comparable ChIA-PET and scATAC datasets. Its harding to find datasets from both assays of the same cells, tissue, species etc. Any recommendations?

hpl...@gmail.com

unread,

Oct 14, 2021, 1:54:24 PM10/14/21

to cicero-users

Hi Dan,

Yes, that's definitely the issue (finding comparable ChiA and ATAC). The only one I know about is the one we used in the original cicero paper, they're from a cell line (GM12878) so not ideal, but better than nothing... ATAC is here (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM2970932) it's mixed HL60 and GM12878 but there's a count matrix of just the GM as well. Pol2 ChiA is here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1872887

I haven't done much of a search in a while, so there may be others.

Best,

Hannah

Daniel Gingerich

unread,

Oct 15, 2021, 10:04:52 AM10/15/21

to cicero-users

Hannah,

Thanks so much! I will definitely check this out in the near future. I ended up finding some usable mouse brain data from Cusanovich mouse atlas (scATAC) and Zhu et al (https://www.nature.com/articles/s41592-021-01060-3?proof=t#data-availability, scChIP seq). I was upset to find that the kME values were not predictive of cCRE presence. Nevertheless, good learning experience in validating new computational methods. I learned about the fisher exact test in the process. I am very busy at the moment, but will post some updates about the results I obtained soon.

Best,

Dan

Message has been deleted

Daniel Gingerich

unread,

Oct 15, 2021, 10:08:51 AM10/15/21

to cicero-users

The fisher test did show enrichment of enhancers and promoters in the overall unfiltered CCANs, which is really cool! Worth mentioning that I did not match peaks for GC content. I just compared CCAN peaks to the rest of my peaks. I should probably do this in the future.

Reply all

Reply to author

Forward