How to use the incremental option on MAJIQ build

brown.a...@gmail.com

unread,

Nov 26, 2021, 9:13:11 AM11/26/21

to majiq_voila

Hello,

I'm trying to analyze ~ 2300 samples with MAJIQ on a cluster.

I've got the .sj files for all my samples as I first ran with the "sj only" parameter.

Majiq was able to get through about 2100 samples and then it hit the cluster wall time without me realizing. So I have the .majiq files for most of the samples.

When the build stopped, the "splicegraph.sql" file was at about 150GB

I took the samples that seemed to have finished out of the config file and reran like this:

If I run the build like this:

"majiq build \
/SAN/vyplab/vyplab_reference_genomes/annotation/human/GRCh38/gencode.v38.annotation.gff3 \
-c test.tsv \
-j 8 -o /SAN/vyplab/NYGC_ALSFTD/analysis_nygc/majiq_incremental/builder/ \
--min-experiments 5 --incremental "

the size of "splicegraph.sql" file dropped down to about 400 MB - which makes me suspect it was overwriting the previous file, and not updating the table.

basic question, how do you run incrementally? Should I have the same output destination, should I put it in a second folder? Can I reuse the "majiq" files output produced by the first run?

Like as an example, if I had a config with this

[info]
bamdirs=mybams/
sjdirs=my_sj/
[experiments]
Experiment1=A,B,C,D
Experiment2=E,F,G,H
Experiment3=I,J,K,L

And the builder got through Experiment1 and then output the majiq files for E,F, and then failed

Could I use incremental to start again with G,H and then add experiement 3 and would my new config be the same or should I make it look like this

[info]
bamdirs=mybams/
sjdirs=my_sj/
[experiments]
Experiment2=G,H
Experiment3=I,J,K,L

Paul Jewell

unread,

Dec 1, 2021, 9:30:38 AM12/1/21

to majiq_voila

Hello,

I am slightly confused by the exact sequence of events that took place.

You may use incremental to avoid processing BAMfiles in your config experiment/group list more than once. However, the splicegraph is an output file, and is generated once per build, so I believe the behavior you are seeing is expected.

Also, just in general I would avoid trust in runs which were interrupted in the middle, though I'd imagine you have supposed this already.

Let me know if it makes sense, and if you were seeing any functional differences from what you expect.

Thanks!

brown.a...@gmail.com

unread,

Dec 14, 2021, 6:32:10 AM12/14/21

to majiq_voila

What about the "splicegraph.sql" file which is generated during the build process? My concern is that the file seemed to be wiped when I restarted the build, e.g. when the job timed out it was at 156 GB, and then dropped down. Does this matter? (i have a feeling the answer is no since I've been able to perform analysis with these data)

Paul Jewell

unread,

Dec 21, 2021, 2:41:05 PM12/21/21

to majiq_voila

Hello,

I'm still slightly confused about exactly what happened. If you had a majiq build run that was interrupted in the middle, depending on exactly the signal it was sent it may have had time to properly close the sql database or not, and there may be some edge case situation where you can read and process some parts of the splice graph to use for analysis even though the build run did not complete, however, I would highly recommend against trusting anything from this, as it is basically undefined behavior.

When using --incremental, the processing time that is saved is the relatively computationally intensive parsing of the experiment bamfiles and saving them to a condensed representation in .sj format. The 'splicegraph.sql' file, which will be later in the pipeline used by voila, is _not_ preserved using --incremental : it is an output file. (for example, if you configure majiq with two experiments in the config file, a small splicegraph will be written, if you configure it with 1000 experiments, a larger splicegraph will be written ; there is _not_ any merging) Please me sure you do not write output from build to the same location if you intend to re-use the old splicegraph.sql file from the previous build. -- it will be truncated as you observed.

Thanks and please let me know if you have any more questions about the pipeline.

Reply all

Reply to author

Forward