how to identify the rare tumor-specific AS events by majiq?

sotaro kanematsu

unread,

Dec 6, 2024, 1:25:03 AM12/6/24

to Biociphers

Hi Developer. I didn't get any response, so I started a new thread, again.

Is there anyone who can answer the following question?

> I am currently using MAJIQ to identify tumor-specific splicing variants. My analysis includes approximately 1,000 tumor samples and 800 non-tumor samples.

In general, based on examples like MET exon 14 skipping, tumor-specific splicing variants are considered extremely rare events, likely occurring in less than 1% of certain tumor types. I read the MAJIQlopedia paper summarizing cancer RNA splicing isoforms using TCGA data, but I noticed that the Methods section did not provide detailed explanations regarding the scripts used.

Could you please share your insights on the best approach to analyze and identify extremely rare tumor-specific splicing variants using MAJIQ?

Regards,

Sota

Message has been deleted

bsl...@seas.upenn.edu

unread,

Dec 6, 2024, 5:40:05 PM12/6/24

to Biociphers

Dear Sota,

Thank you for your question. Splicing variations in the MAJIQlopedia paper were defined and quantified using MAJIQ v2.2. To begin using MAJIQ, please follow the instructions at https://majiq.biociphers.org/ . Briefly, MAJIQ takes as input 1) aligned and indexed reads from RNA-seq experiments as BAM files, and 2) a transcriptome annotation such as Ensembl GRCh38. First, run the MAJIQ builder to define the splicegraph (collection of splicing variations) and calculate read coverage per sample and splicing variation. Next, you may optionally run MOCCASIN to adjust for confounding factors such as batch effects. Finally, run the MAJIQ quantifier, which has options to quantify PSI and/or PSI differences between two groups of samples. The MAJIQ documentation may be found here: https://biociphers.bitbucket.io/majiq-docs/ .

Since you are looking for rare events, in both the build and quantify (psi) steps, you probably will want to specify the min-experiments parameter to be a relatively small fraction or number of samples, maybe even one or two samples only if appropriate; this will ensure that splicing events which are rare among your samples are not filtered out. However, note that lowering the thresholds will result in more splice junctions included and thus much larger output files and arguably more splicing “noise”. Additionally, to look specifically for rare events, you might use the “[group-name]_num_quantified” column output from het.voila.tsv files generated by MAJIQ-HET.

We also have an upcoming MAJIQ release which will allow for comparing splicing between individual RNA-seq samples (i.e. from one cancer case) against a panel of controls (e.g. non-cancer). This can be used to find splicing which is unique or significantly different in each patient. We plan to soon release MAJIQ V3 (preprint here) and a corresponding pipeline which runs the 1-vs-many comparison. Once released, this could be used to find experiment-specific splicing aberrations. You could then look for aberrations which are common to one or only a few cancer cases, yet absent from the non-cancer controls.

The MAJIQlopedia paper used MAJIQ to identify nearly 80,000 variable splice junctions found in cancers and not in normal tissues. Thus, it’s possible you will also find a large number of cancer-specific splicing variations. Your analysis with MAJIQ might attempt to identify a smaller, higher-confidence set of cancer-specific variations in a couple of ways. One would be to quantify junction and intron inclusion using MAJIQ PSI and then filter down the cancer-specific results set by thresholding on Var[PSI], i.e. a lower variance indicates higher confidence in E[PSI] based on consistent inclusion across samples and/or relatively high RNA-seq read coverage. Another approach would be to use MAJIQ HET to compare junction and intron inclusion between two groups of samples (e.g., particular cancer and non-cancer) and threshold on the test p-values and dPSI between medians in each group to obtain a set of splicing variations which are different in cancer with relatively high confidence. However, since you are looking for rare events, it’s possible the events of interest might not have particularly high coverage or consistency across samples in the cancer group.

Please let us know if you have additional questions.

Barry

bsl...@seas.upenn.edu

unread,

Dec 10, 2024, 12:15:55 PM12/10/24

to Biociphers

Dear Sota,

Thank you for your question. Splicing variations in the MAJIQlopedia paper were defined and quantified using MAJIQ v2.2. To begin using MAJIQ, please follow the instructions at https://majiq.biociphers.org/ . Briefly, MAJIQ takes as input 1) aligned and indexed reads from RNA-seq experiments as BAM files, and 2) a transcriptome annotation such as Ensembl GRCh38. First, run the MAJIQ builder to define the splicegraph (collection of splicing variations) and calculate read coverage per sample and splicing variation. Next, you may optionally run MOCCASIN to adjust for confounding factors such as batch effects. Finally, run the MAJIQ quantifier, which has options to quantify PSI and/or PSI differences between two groups of samples. The MAJIQ documentation may be found here: https://biociphers.bitbucket.io/majiq-docs/ .

Since you are looking for rare events: in both the build and quantify (psi) steps, you probably will want to specify the min-experiments parameter to be a relatively small fraction or number of samples, maybe even one or two samples only if appropriate; this will ensure that splicing events which are rare among your samples are not filtered out. However, note that lowering these thresholds will result in more splice junctions included and thus much larger output files and arguably more splicing “noise”. Additionally, to look specifically for rare events, you might use the “[group-name]_num_quantified” column output from het.voila.tsv files generated by MAJIQ-HET.

We also have an upcoming MAJIQ release which will allow for comparing splicing between individual RNA-seq samples (i.e. from one patient) against a panel of controls (e.g. non-cancer). This can be used to find splicing which is unique or significantly different in each patient. We plan to soon release MAJIQ V3 (preprint here) and a corresponding pipeline which runs the 1-vs-many comparison. Once released, this could be used to find experiment-specific splicing aberrations. You could then look for aberrations which are common to one or only a few cancer cases, yet absent from the non-cancer controls.

The MAJIQlopedia paper used MAJIQ to identify nearly 80,000 variable splice junctions found in cancers and not in normal tissues. Thus, it’s possible you will also find a large number of cancer-specific splicing variations. Your analysis with MAJIQ might attempt to identify a smaller, higher-confidence set of cancer-specific variations in a couple of ways. One would be to quantify junction and intron inclusion using MAJIQ PSI and then filter down the cancer-specific results set by thresholding on Var[PSI], i.e. a lower variance indicates higher confidence in E[PSI] based on consistent inclusion across samples and/or relatively high RNA-seq read coverage. Another approach would be to use MAJIQ HET to compare junction and intron inclusion between two groups of samples (e.g., particular cancer and non-cancer) and threshold on the test p-values and dPSI between medians in each group to obtain a set of splicing variations which are different in cancer with relatively high confidence. However, since you are looking for rare events, it’s possible the events of interest might not have particularly high coverage or consistency across samples in the cancer group.

Please let us know if you have additional questions.

Barry

On Friday, December 6, 2024 at 1:25:03 AM UTC-5 s.kan...@scchr.jp wrote:

sotaro kanematsu

unread,

Dec 18, 2024, 12:12:52 AM12/18/24

to Biociphers

2024年12月11日水曜日 2:15:55 UTC+9 bsl...@seas.upenn.edu:

MET_e13_16_output.tsv

METexon14.pdf

Message has been deleted

sotaro kanematsu

unread,

Dec 19, 2024, 7:02:18 PM12/19/24

to Biociphers

Thank you for providing the information. Following your suggestion, we conducted validation using the provided script on BAM files from 124 tumor samples (including those known to have MET exon 14 skipping) and 100 non-cancerous samples. Unfortunately, we were unable to detect MET exon 14 skipping. What do you think could be the cause of this issue?

I have attached the settings_file.ini file and IGV screenshots of the MET exon 14 skipping samples for reference. Below is the script we executed:

1.build step

majiq build -j8 --min-experiments 1 -c settings_file.ini /run/media/skanematsu/kanematsu/RNASeq/arriba/gencode.v19.annotation.modified.gff3 -o /run/media/skanematsu/kanematsu/RNASeq/splicing/majiq_build_result

2. heterogen step

majiq heterogen --min-experiments 1 -o /run/media/skanematsu/kanematsu/RNASeq/splicing/majiq_heterogen_result -n tumor normal -grp1 ./SCPO*T*majiq -grp2 ./SCPO*N*majiq

2024年12月11日水曜日 2:15:55 UTC+9 bsl...@seas.upenn.edu:

Dear Sota,

settings_file.ini

bsl...@seas.upenn.edu

unread,

Jan 8, 2025, 2:40:08 PMJan 8

to Biociphers

Dear Sota,

Thank you for your email. Kindly attach and send the IGV screenshots you mentioned.

Thank you,

Barry

bsl...@seas.upenn.edu

unread,

Jan 11, 2025, 9:45:11 PMJan 11

to Biociphers

Dear Sota,

If MET exon 14 skipping is one of the extremely rare events you mentioned, i.e. occurs in only a small proportion of samples in the tumor group, then I would not necessarily expect MAJIQ-HET or MAJIQ-deltaPSI to flag those splicing events because the LSVs for this exon could have mostly similar PSI quantifications across the two groups (if it's an extremely rare event). Indeed, the HET tests you executed find no significant difference at the group level (tumor vs non-tumor).

We plan to soon release a new MAJIQ version which finds splicing aberrations specific to one sample (i.e., each tumor sample) compared to a panel of controls (i.e., all the normal samples). Once we release it, you could use this to identify the splicing aberrations specific to each tumor sample.

Today, you could execute MAJIQ PSI for all samples where each sample is its own build group so even a junction in a single sample is included in the analysis. Then, you could manually compare E[PSI] values in the set of say controls (non-tumor) against a specific tumor sample of interest to detect splicing variations of potential interest. Note that this procedure would only use E[PSI] values, which do not fully account for the estimated variance in PSI (factoring in coverage and read-position variation), whereas the statistical procedure in the coming release (mentioned above) does account for this.

Please let us know if you have additional questions.

Best Regards,

Barry

Reply all

Reply to author

Forward