I have questions regarding some of the majiq build and majiq psi parameters. I've run majiq build with min-experiments not specified (so the default of >=50% of samples is applied) on 24 samples (2 experiments of 12 replicates defined in the config file). I then run majiq psi and voila psi on each sample independently. This results in 215512 LSVs being quantified in at least one of these 24 samples. Many of these LSVs are not quantified in very many samples however, for example 8029 LSVs are only quantified in 1/24 samples.
My interpretation of this is that during the build step, 8029 LSVs are defined which only pass the min-reads, minpos etc. filters in one sample. I would expect each defined LSV to pass these filters in at least 6 samples (50% of experiments in the defined experimental groups). Am I missing something/misunderstood?
My interest is to have a a matrix of LSVs with one PSI value per junction per sample for use with other tools downstream. Since if majiq psi and voila is run with multiple samples the result is an average PSI per LSV junction (not 1 per sample), I currently run these commands separately per sample and then merge the resulting data for each of the 24 samples by LSV ID. I was surprised that the resulting LSV-intersection matrix contains only 149041 out of 215512 total LSVs, and was worried that meaningful biology/LSVs are being lost.
I have further questions regarding minpos and minreads:
What is a "start position" with regards to -minpos?
Is minreads a simple threshold where all reads mapping to any location within an LSV are summed and the total must be >= minreads?
Hopefully my questions are clear, and thank you in advance for your time!