In general, based on examples like MET exon 14 skipping, tumor-specific splicing variants are considered extremely rare events, likely occurring in less than 1% of certain tumor types. I read the MAJIQlopedia paper summarizing cancer RNA splicing isoforms using TCGA data, but I noticed that the Methods section did not provide detailed explanations regarding the scripts used.
Could you please share your insights on the best approach to analyze and identify extremely rare tumor-specific splicing variants using MAJIQ?
Regards,
Sota
Dear Sota,
Thank you for your question. Splicing variations in the MAJIQlopedia paper were defined and quantified using MAJIQ v2.2. To begin using MAJIQ, please follow the instructions at https://majiq.biociphers.org/ . Briefly, MAJIQ takes as input 1) aligned and indexed reads from RNA-seq experiments as BAM files, and 2) a transcriptome annotation such as Ensembl GRCh38. First, run the MAJIQ builder to define the splicegraph (collection of splicing variations) and calculate read coverage per sample and splicing variation. Next, you may optionally run MOCCASIN to adjust for confounding factors such as batch effects. Finally, run the MAJIQ quantifier, which has options to quantify PSI and/or PSI differences between two groups of samples. The MAJIQ documentation may be found here: https://biociphers.bitbucket.io/majiq-docs/ .
Since you are looking for rare events, in both the build and quantify (psi) steps, you probably will want to specify the min-experiments parameter to be a relatively small fraction or number of samples, maybe even one or two samples only if appropriate; this will ensure that splicing events which are rare among your samples are not filtered out. However, note that lowering the thresholds will result in more splice junctions included and thus much larger output files and arguably more splicing “noise”. Additionally, to look specifically for rare events, you might use the “[group-name]_num_quantified” column output from het.voila.tsv files generated by MAJIQ-HET.
We also have an upcoming MAJIQ release which will allow for comparing splicing between individual RNA-seq samples (i.e. from one cancer case) against a panel of controls (e.g. non-cancer). This can be used to find splicing which is unique or significantly different in each patient. We plan to soon release MAJIQ V3 (preprint here) and a corresponding pipeline which runs the 1-vs-many comparison. Once released, this could be used to find experiment-specific splicing aberrations. You could then look for aberrations which are common to one or only a few cancer cases, yet absent from the non-cancer controls.
The MAJIQlopedia paper used MAJIQ to identify nearly 80,000 variable splice junctions found in cancers and not in normal tissues. Thus, it’s possible you will also find a large number of cancer-specific splicing variations. Your analysis with MAJIQ might attempt to identify a smaller, higher-confidence set of cancer-specific variations in a couple of ways. One would be to quantify junction and intron inclusion using MAJIQ PSI and then filter down the cancer-specific results set by thresholding on Var[PSI], i.e. a lower variance indicates higher confidence in E[PSI] based on consistent inclusion across samples and/or relatively high RNA-seq read coverage. Another approach would be to use MAJIQ HET to compare junction and intron inclusion between two groups of samples (e.g., particular cancer and non-cancer) and threshold on the test p-values and dPSI between medians in each group to obtain a set of splicing variations which are different in cancer with relatively high confidence. However, since you are looking for rare events, it’s possible the events of interest might not have particularly high coverage or consistency across samples in the cancer group.
Please let us know if you have additional questions.
Barry
Dear Sota,
Thank you for your question. Splicing variations in the MAJIQlopedia paper were defined and quantified using MAJIQ v2.2. To begin using MAJIQ, please follow the instructions at https://majiq.biociphers.org/ . Briefly, MAJIQ takes as input 1) aligned and indexed reads from RNA-seq experiments as BAM files, and 2) a transcriptome annotation such as Ensembl GRCh38. First, run the MAJIQ builder to define the splicegraph (collection of splicing variations) and calculate read coverage per sample and splicing variation. Next, you may optionally run MOCCASIN to adjust for confounding factors such as batch effects. Finally, run the MAJIQ quantifier, which has options to quantify PSI and/or PSI differences between two groups of samples. The MAJIQ documentation may be found here: https://biociphers.bitbucket.io/majiq-docs/ .
Since you are looking for rare events: in both the build and quantify (psi) steps, you probably will want to specify the min-experiments parameter to be a relatively small fraction or number of samples, maybe even one or two samples only if appropriate; this will ensure that splicing events which are rare among your samples are not filtered out. However, note that lowering these thresholds will result in more splice junctions included and thus much larger output files and arguably more splicing “noise”. Additionally, to look specifically for rare events, you might use the “[group-name]_num_quantified” column output from het.voila.tsv files generated by MAJIQ-HET.
We also have an upcoming MAJIQ release which will allow for comparing splicing between individual RNA-seq samples (i.e. from one patient) against a panel of controls (e.g. non-cancer). This can be used to find splicing which is unique or significantly different in each patient. We plan to soon release MAJIQ V3 (preprint here) and a corresponding pipeline which runs the 1-vs-many comparison. Once released, this could be used to find experiment-specific splicing aberrations. You could then look for aberrations which are common to one or only a few cancer cases, yet absent from the non-cancer controls.
The MAJIQlopedia paper used MAJIQ to identify nearly 80,000 variable splice junctions found in cancers and not in normal tissues. Thus, it’s possible you will also find a large number of cancer-specific splicing variations. Your analysis with MAJIQ might attempt to identify a smaller, higher-confidence set of cancer-specific variations in a couple of ways. One would be to quantify junction and intron inclusion using MAJIQ PSI and then filter down the cancer-specific results set by thresholding on Var[PSI], i.e. a lower variance indicates higher confidence in E[PSI] based on consistent inclusion across samples and/or relatively high RNA-seq read coverage. Another approach would be to use MAJIQ HET to compare junction and intron inclusion between two groups of samples (e.g., particular cancer and non-cancer) and threshold on the test p-values and dPSI between medians in each group to obtain a set of splicing variations which are different in cancer with relatively high confidence. However, since you are looking for rare events, it’s possible the events of interest might not have particularly high coverage or consistency across samples in the cancer group.
Please let us know if you have additional questions.
Barry
Dear Sota,