Hi there,
Thank you very much for the tool.
I seem to be having some issues with running HET mode.
I have a cancer dataset (16 normal vs tumour tissues), with which I know there are significant splicing changes in, and indeed, even Majiq dPSI mode could pick up some of the strongest changes, even though with cancer data I believe in theory HET should be the most appropriate/fruitful mode for this dataset?
I have tried running the data through multiple runs of processing, and while the het.tsv appears to have a large number of changing junctions at a "changing between dPSI of 0.1" ( and over 76K juncs with Mann-Whitney p < 0.05) , for the modulized output, every value for event non_changing, changing and junction changing is false. I presume something must be wrong to a reasonable degree here (with my setup?) because this seems fairly catastrophic.
I have tried to follow the walkthrough for HET faithfully, and my setup thus far for HET is....
#build annotated SG
majiq-v3 gff3 --license ../majiq_license_academic_official.lic reference/Homo_sapiens.GRCh38.115.gff output/annotations/sg.zarr
#build individual .sj files for each sample
majiq-build sj \
--nthreads 6 \
--license $lic \
"$bam" "$splicegraph" /majiq/majiq_new/output/sj/${base}.sj"
#update the annotation sg with sample juncs
majiq-v3 build --groups-tsv groups.tsv --license ../majiq_license_academic_official.lic --min-experiments 3 output/annotations/sg.zarr output/build/sg.zarr
#extract coverage from .sj files using the updated sg
majiq psi-coverage \
--license "$lic" \
--nthreads 6 \
"$basedir/output/build/sg.zarr" "$outdir" "$sj"
#het with group 1 as normal and 2 as tumour
majiq-v3 heterogen \
--license "$lic" \
-n normal tumour \
--splicegraph output/build/sg.zarr \
-psi1 output/psi_cov/SRR195199{99,66,96,90,87,85}.psicov \
output/psi_cov/SRR195200{52,50,45,42,37,35,28,23,08,04}.psicov \
-psi2 output/psi_cov/SRR195200{10,63,51,46,44,38,36,29,24,09,05}.psicov \
output/psi_cov/SRR195199{77,95,91,89,86}.psicov \
--output-voila output/het/normal_vs_tumour.hetcov \
--output-tsv output/het/normal_vs_tumour.tsv
#sg-coverage
for sj in output/sj/*.sj; do
base=$(basename "$sj" .sj)
majiq-v3 sg-coverage \
--license "$lic" \
"$basedir/output/build/sg.zarr" \
"$basedir/output/sgc/${base}.sgc" \
"$sj"
done
#voila tsv
srun -w calculon voila tsv \
-j 10 --show-read-counts --show-per-sample-psi --changing-between-group-dpsi 0.1 \
--show-all output/build/sg.zarr output/het/*.hetcov output/sgc/*.sgc -f output/voila_tsv/het.tsv
#het modulize out
srun -w calculon voila modulize \
-j 30 --show-all --show-read-counts --show-per-sample-psi --changing-between-group-dpsi 0.1 \
output/build/sg.zarr output/het/*.hetcov output/sgc/*.sgc \
-d output/voila_modulized
###############################
I would really appreciate any helpers. As I said, I know there are changes in this dataset, and the fact that everything is False would lead me to believe I'm doing something wrong.
Apologies for the essay, thought best to include more rather than less, and thank you.
"voila_version": "3.0.19.dev1+gde48918a5"