Dear MAJIQ development team,
I’m working on MAJIQ v3, and I would appreciate your guidance regarding the correct input for `sg-coverage` in the context of batch correction.
My current workflow:
I have 6 samples (3 CONTROL and 3 DISEASE) from 4 different batches. I followed the pipeline as below:
1. Extract splice junctions
majiq-v3 sj CON1 ./6.Majiq/annotations/sg.zarr ./6.Majiq/sj/CON1.sj
... # similarly for CON2, CON3, DISEASE1,
DISEASE2,
DISEASE3
2. Build splicegraph
majiq-v3 build -j 32 ./6.Majiq/annotations/sg.zarr ./6.Majiq/build/sg.zarr \
--sjs ./6.Majiq/sj/CON1.sj ... DISEASE3.sj
3. Estimate PSI per sample
majiq-v3 psi-coverage ./6.Majiq/build/sg.zarr ./6.Majiq/psi/CON1.psicov ./6.Majiq/sj/CON1.sj
... DISEASE3.sj
4. Batch correction
majiq moccasin -j 32 ./6.Majiq/batch_corrected \
./6.Majiq/psi/CON1.psicov ... DISEASE3.psicov \
--factors-tsv ./6.Majiq/factors.tsv \
--confounding batch_1 batch_2 batch_3 batch_4
5. Differential splicing
majiq-v3 deltapsi \
--splicegraph ./6.Majiq/build/sg.zarr \
--output-tsv ./6.Majiq/deltapsi/CONTROL-AD.deltapsi.tsv \
--output-voila ./6.Majiq/deltapsi/CONTROL-AD.deltapsi.voila \
-psi1 ./6.Majiq/batch_corrected/corrected.psicov \
-psi2 ./6.Majiq/batch_corrected/corrected.psicov \
--select-grp1-prefixes CON1.out CON2.out CON3.out \
--select-grp2-prefixes DISEASE1.out DISEASE2.out DISEASE3.out \
-n CONTROL DISEASE
Now I want to generate .sgc files for visualization with voila view:
majiq-v3 sg-coverage ./6.Majiq/build/sg.zarr ./6.Majiq/build/CONTROL.sgc CON1.sj CON2.sj CON3.sj
My questions are:
Is the above pipeline correct for handling multi-batch data with batch correction via moccasin?
For generating .sgc files (e.g., CONTROL.sgc and DISEASE.sgc), should I use the original .sj files or batch-corrected .sj files ?
If .sj files should reflect batch-corrected data, how can I extract or generate them from corrected.psicov?
Your clarification on how to correctly generate `.sgc` after batch correction would be greatly appreciated.
Thank you very much for your help!
Best regards,
Shirley
Dear Shirly,
The flow of commands looks right, some cautionary remarks:
“majiq-v3 sj CON1 …” I think CON1 should be the path to a bam, e.g. CON1.bam
“majiq moccasin” For moccasin, please note that the factors matrix must be full-rank or moccasin will throw an error. You mentioned there are 4 batches, and you are listing all 4 as confounding factors. If you *also* include an intercept column, this won’t work because the model matrix will not be full-rank. Please see e.g. this conversation for more explanation.
“majiq-v3 deltapsi” the arguments to –psi1 and –psi2 should be paths to the corrected psicov files for each respective group.
Today, majiq can output only moccasin-adjusted psicov files, not adjusted sj files. Thus, VOILA displays the original (non-adjusted) read data. Quantifications (PSI, delta-PSI, HET) may use moccasin-adjusted psi coverage. We would consider adding the functionality to output adjusted sg coverage for VOILA if there is strong user interest.
Best Regards,
Barry
“majiq-v3 deltapsi” the arguments to –psi1 and –psi2 should be paths to the corrected psicov files for each respective group.