Hi there,
Thank you for the great updates in the latest version of MAJIQ. I am trying to get to grips with the latest version updates and have a few questions:
heterogen vs deltapsi
When should we use
heterogen and when
deltapsi? It seems
deltapsi should be used for replicates from a single experiment and
heterogen for large heterogenous populations? The
heterogen example seems to be on ENCODE replicates from a single experiment?
heterogen voila tsv output
I am not clear what a few columns in the Voila TSV output refer to, specifically:
- {grp1,grp2}_num_quantified
- tnom_score
Have the test statistics been adjusted for repeat testing?
deltapsi voila tsv output
I saw in a
forum post that
probability_changing does not represent p-values and instead posterior probabilities. I know we can set dPSI thresholds e.g. 10% or 20% but what significance threshold is recommended on probability_changing?
Although the Voila TSV outputs have been tidied up (thank you!), there are still a few issues for those of us that use R:
"-" in the column names
";" within observations
"True", "False" not recognised as logical - need to be in CAPS
To address these issues I made an R function built around tidyverse that might be helpful for others to import and tidy the voila.tsv output:
read_voila_het <- function(voila.tsv, grp1 = NULL, grp2 = NULL){
if(file.exists(voila.tsv) == FALSE){ stop("MAJIQ voila.tsv file not found") }
read_tsv({{voila.tsv}}, skip = 15) %>% clean_names() %>%
separate_rows(c("tnom", "ttest","wilcoxon","tnom_quantile","ttest_quantile","wilcoxon_quantile","tnom_score" ,"changing","nonchanging", paste0({{grp1}},"_median_psi"), paste0({{grp2}},"_median_psi"), paste0({{grp1}},"_percentile25_psi"), paste0({{grp2}},"_percentile25_psi"), paste0({{grp1}},"_percentile75_psi"), paste0({{grp2}},"_percentile75_psi"), "de_novo_junctions","junctions_coords"), sep = ";", convert = TRUE) %>% #,"exons_coords"
mutate(gene_id = gsub("gene:","",gene_id), lsv_id = gsub("gene:","",lsv_id),
tnom_padj = p.adjust(tnom), ttest_padj = p.adjust(ttest), wilcoxon_padj = p.adjust(wilcoxon),
deltapsi = !!sym(paste0({{grp1}},"_median_psi")) - !!sym(paste0({{grp2}},"_median_psi")), changing_dpsi0.2 = as.logical(changing), changing_dpsi0.1 = case_when(tnom < 0.05 & abs(deltapsi) > 0.1 ~ TRUE, TRUE ~ FALSE)) %>%
arrange(-abs(deltapsi)) %>% select(gene_name, gene_id,lsv_id, deltapsi, tnom, ttest, wilcoxon, ends_with("_padj"), ends_with("_median_psi"), changing_dpsi0.2, changing_dpsi0.1, everything())
}
voila view
Why do some groups not display violin plots for LSVs when there are reads visible in the splicegraph for that group?
In the splicegraphs, what do the numbers above the splicing junctions represent? Presumably, they are some form of a normalised splice junction read count?
Finally, I wondered if MAJIQ could be used to quantify splicing in total RNA libraries ± Ribo-Zero RNAseq experiments e.g. with
TruSeq Stranded total RNA? i.e. not undergone oligoDT enrichment for polyA mRNA?
Many thanks!
Oliver