Question on sg-coverage Input After Batch Correction in MAJIQ v3

62 views
Skip to first unread message

cxy

unread,
Jul 28, 2025, 11:43:52 AMJul 28
to Biociphers

Dear MAJIQ development team,

I’m working on MAJIQ v3, and I would appreciate your guidance regarding the correct input for `sg-coverage` in the context of batch correction.

My current workflow:

I have 6 samples (3 CONTROL and 3 DISEASE) from 4 different batches. I followed the pipeline as below:

1. Extract splice junctions  
   majiq-v3 sj CON1 ./6.Majiq/annotations/sg.zarr ./6.Majiq/sj/CON1.sj  
   ... # similarly for CON2, CON3, DISEASE1, DISEASE2, DISEASE3

2. Build splicegraph  
   majiq-v3 build -j 32 ./6.Majiq/annotations/sg.zarr ./6.Majiq/build/sg.zarr \  
     --sjs ./6.Majiq/sj/CON1.sj ... DISEASE3.sj

3. Estimate PSI per sample  
   majiq-v3 psi-coverage ./6.Majiq/build/sg.zarr ./6.Majiq/psi/CON1.psicov ./6.Majiq/sj/CON1.sj   ... DISEASE3.sj

4. Batch correction  
   majiq moccasin -j 32 ./6.Majiq/batch_corrected \  
     ./6.Majiq/psi/CON1.psicov ... DISEASE3.psicov \  
     --factors-tsv ./6.Majiq/factors.tsv \  
     --confounding batch_1 batch_2 batch_3 batch_4

5. Differential splicing  
   majiq-v3 deltapsi \  
     --splicegraph ./6.Majiq/build/sg.zarr \  
     --output-tsv ./6.Majiq/deltapsi/CONTROL-AD.deltapsi.tsv \  
     --output-voila ./6.Majiq/deltapsi/CONTROL-AD.deltapsi.voila \  
     -psi1 ./6.Majiq/batch_corrected/corrected.psicov \  
     -psi2 ./6.Majiq/batch_corrected/corrected.psicov \  
     --select-grp1-prefixes CON1.out CON2.out CON3.out \  
     --select-grp2-prefixes DISEASE1.out DISEASE2.out DISEASE3.out \  
     -n CONTROL DISEASE

Now I want to generate .sgc files for visualization with voila view:

majiq-v3 sg-coverage ./6.Majiq/build/sg.zarr ./6.Majiq/build/CONTROL.sgc CON1.sj CON2.sj CON3.sj

My questions are:

Is the above pipeline correct for handling multi-batch data with batch correction via moccasin?

For generating .sgc files (e.g., CONTROL.sgc and DISEASE.sgc), should I use the original .sj files or batch-corrected .sj  files   ?

If .sj files should reflect batch-corrected data, how can I extract or generate them from corrected.psicov?


Your clarification on how to correctly generate `.sgc` after batch correction would be greatly appreciated.

Thank you very much for your help!

Best regards,  

Shirley

bsl...@seas.upenn.edu

unread,
Jul 30, 2025, 1:16:06 PMJul 30
to Biociphers

Dear Shirly,

The flow of commands looks right, some cautionary remarks:

  • “majiq-v3 sj CON1 …” I think CON1 should be the path to a bam, e.g. CON1.bam

  • “majiq moccasin” For moccasin, please note that the factors matrix must be full-rank or moccasin will throw an error. You mentioned there are 4 batches, and you are listing all 4 as confounding factors. If you *also* include an intercept column, this won’t work because the model matrix will not be full-rank. Please see e.g. this conversation for more explanation.

  • “majiq-v3 deltapsi” the arguments to –psi1 and –psi2 should be paths to the corrected psicov files for each respective group.


Today, majiq can output only moccasin-adjusted psicov files, not adjusted sj files. Thus, VOILA displays the original (non-adjusted) read data. Quantifications (PSI, delta-PSI, HET) may use moccasin-adjusted psi coverage. We would consider adding the functionality to output adjusted sg coverage for VOILA if there is strong user interest.


Best Regards,

Barry

bsl...@seas.upenn.edu

unread,
Jul 30, 2025, 1:17:26 PMJul 30
to Biociphers
Dear Shirley* (apologies)

Damien Plassard

unread,
Aug 13, 2025, 4:17:15 AMAug 13
to Biociphers

Dear Barry,

I'm just responding to this post because I'm also interested in this kind of workflow. I just have one doubt about the sentence you wrote :

  • “majiq-v3 deltapsi” the arguments to –psi1 and –psi2 should be paths to the corrected psicov files for each respective group.


Does it mean we have to run majiq moccasin for CONTROL and for DISEASE in order to have a corrected psicov file for each of these groups ? Because I am in the same situation, for now I run this command using the psicov files for all conditions, so I got a single corrected psicov file as output.

Thank you for your help,
Best regards,
Damien

bsl...@seas.upenn.edu

unread,
Aug 19, 2025, 3:16:38 PMAug 19
to Biociphers
Sorry I was unclear about this before— if using moccasin output, the arguments to -psi1 and -psi2 will be the same corrected psicov file (since it contains corrected psi coverage for both groups), with experiment prefixes for each group specified using  
--select-grp1-prefixes and 
--select-grp2-prefixes . So, run moccasin once, you don’t need two different psicov output files.
Barry
Reply all
Reply to author
Forward
0 new messages