Dear Zhang Lab members,
I am currently working on a miCLIP dataset using your tools.
I used the CITS.pl utility and then tried to generate an overview of the base composition at truncation sites. This is when I noticed, that there were several truncation sites called >1 nt (see below). Initially I used the parameter "--gap 25", so I thought it might be connected to truncation sites clustering into stretches of multiple nucleotides.
I then turned of clustering (or so I thought) by setting "--gap -1", but then I got even worse results.
Is there an explanation/workaround for this? Would it be viable to simply split the n-mers into n separate sites, or does the detection algorithm/p-value, etc depend on the n-mer record?
Output Truncation sites with --gap 25:
6922 A
85 AA
1 AAA
5 AC
1 ACT
6 AG
27 AT
326 C
1 CA
1 CC
2 CT
998 G
55 GA
2 GAT
1 GG
5 GT
4834 T
26 TA
1 TAT
4 TC
1 TGT
59 TT
2 TTT
Output Truncation sites with --gap -1:
23741 A
383 AA
2 AAA
2 AAC
9 AAT
49 AC
1 ACG
2 ACT
17 AG
239 AT
2 ATA
1 ATC
7 ATT
3910 C
55 CA
4 CAT
8 CC
5 CG
67 CT
2 CTA
5 CTT
6170 G
238 GA
5 GAT
5 GC
9 GG
41 GT
2 GTA
1 GTT
27099 T
205 TA
4 TAA
1 TACTT
6 TAT
47 TC
1 TCA
1 TCT
24 TG
1 TGA
1 TGT
474 TT
1 TTA
2 TTC
2 TTG
14 TTT
1 TTTC
1 TTTG
1 TTTT
I would appreciate any kind of help on this.
Best regards,
Andreas