Hello,
I'm trying to analyze ~ 2300 samples with MAJIQ on a cluster.
I've got the .sj files for all my samples as I first ran with the "sj only" parameter.
Majiq was able to get through about 2100 samples and then it hit the cluster wall time without me realizing. So I have the .majiq files for most of the samples.
When the build stopped, the "splicegraph.sql" file was at about 150GB
I took the samples that seemed to have finished out of the config file and reran like this:
If I run the build like this:
"majiq build \
/SAN/vyplab/vyplab_reference_genomes/annotation/human/GRCh38/gencode.v38.annotation.gff3 \
-c test.tsv \
-j 8 -o /SAN/vyplab/NYGC_ALSFTD/analysis_nygc/majiq_incremental/builder/ \
--min-experiments 5 --incremental "
the size of "splicegraph.sql" file dropped down to about 400 MB - which makes me suspect it was overwriting the previous file, and not updating the table.
basic question, how do you run incrementally? Should I have the same output destination, should I put it in a second folder? Can I reuse the "majiq" files output produced by the first run?
Like as an example, if I had a config with this
[info]
bamdirs=mybams/
sjdirs=my_sj/
[experiments]
Experiment1=A,B,C,D
Experiment2=E,F,G,H
Experiment3=I,J,K,L
And the builder got through Experiment1 and then output the majiq files for E,F, and then failed
Could I use incremental to start again with G,H and then add experiement 3 and would my new config be the same or should I make it look like this
[info]
bamdirs=mybams/
sjdirs=my_sj/
[experiments]
Experiment2=G,H
Experiment3=I,J,K,L