Hi Zhang lab,
Prior to our lab shutting down for coronavirus, I was preparing my eCLIP samples for sequencing. I was unfortunately unable to finish before things shut down, but during this time I have made use of your CTK toolset to analyze existing eCLIP data in preparation for my own. Thankfully, there is data relevant to my cell type out there from Van Nostrand.
I have followed and completed your guide to using CTK for eCLIP. I have been starting to try different motif enrichment tools and had some success with MEME-ChIP.
Which contains the following #s:
Crosslink sites:
707 (CIMS, deletion) ;
1050 (CIMS, insertion);
14461 (CIMS, substitution);
12770 (CITS)
This is different than my result, but in most cases, not so drastically. I have taken these numbers from the corresponding s30 files (*tag.uniq.del.CIMS.s30.bed) via wc -l file
Crosslink sites: 381 (CIMS, deletion) ; 607 (CIMS, insertion); 10720 (CIMS, substitution); 228 (CITS)
For CITS, the wc -l was taken from the tag.uniq.clean.CITS.s30.bed
How can I go about troubleshooting this discrepancy?
Moving forward, I would like to identify at high resolution the binding site at specific genes that are of interest to me and highly bound in this dataset.
To do this properly, I would imagine accurate calling of CITS is essential, but to be honest, I am not sure how to proceed with this question even if I had confidence in my CITS.
Kind regards,
Steve