Hi Tomas,
I apologize for the delay in reply. I needed to get some clarifying information about the blog post from the ones who wrote it.
The calculation mentioned in the blog post, the figure is fig4a bottom panel of the majiq-L paper. For calculating an approximate PSI for long-read data, voila uses a generalized Jeffery prior which is 1/J and (J-1)//J where J is the
number of junctions in the LSV as we do in MAJIQ, then compute the expected PSI. We do not claim this is an ideal solution for all long read data, but it is comparable for the purposes of the majiq-L voila view.
Here is an example of the function used in voila view to determine the long reads adjusted psi value and bins of the violin plot to show. all_junc_reads is a simple list of reads for all junctions of the lsv, and "np" in this code stands for numpy.
def beta_prior(all_junc_reads):
nj = len(all_junc_reads)
adj_psi = []
junc_bins = []
for junc_i, junc_reads in enumerate(all_junc_reads):
a = junc_reads + (1/nj)
b = sum(all_junc_reads[:junc_i] + all_junc_reads[junc_i+1:]) + ((nj-1)/nj) * (nj-1)
adjusted_psi = a / (a+b)
x = np.linspace(0,1, 40)
bins = beta.pdf(x, a, b)
max_non_inf_value = max(value for value in bins if value != np.inf)
bins[bins == np.inf] = max_non_inf_value
bins[bins < 1.0e-20] = 1.0e-20
first_non_zero = next((x for x in bins if x != 0), None)
for i in range(len(bins)):
if bins[i] == 0:
bins[i] = first_non_zero
else:
break
last_non_zero = next((x for x in reversed(bins) if x != 0), None)
for i in range(len(bins)-1, -1, -1):
if bins[i] == 0:
bins[i] = last_non_zero
else:
break
bins = bins/np.sum(bins)
adj_psi.append(adjusted_psi)
junc_bins.append(bins.tolist())
return adj_psi, junc_bins
Please let me know if you have any more questions about it.
Thanks,
-San