Paired test taking unusally long time

239 views
Skip to first unread message

anbar...@gmail.com

unread,
Oct 14, 2021, 4:36:35 AM10/14/21
to rMATS User Group
Hi,

I am running rMATS with 37 pre/post-treatment pairs (64 BAM files). It created all the  .rmats files, raw input files for JC and JCEC and fromGTF files. However, it stuck at the statistical test for the first AS event, SE (that too at JC). The paradise_status file shows 17% completed after 2 days, and it's still running. Seems, it is a bit unusual.

I am using the following command:

$pdir/python $rdir/rmats.py --b1 $bdir/aa.txt --b2 $bdir/bb.txt --gtf $dgtf/Homo_sapiens.GRCh38.104.gtf -t paired --readLen 101 --novelSS --paired-stats --nthread 10 --od $out/all --tmp $out/all/tmpout

Could you please help me fix this issue?

I am running it on a cluster environment with a high memory node.

Many thanks,
Anbarasu

kutsc...@gmail.com

unread,
Oct 14, 2021, 9:19:11 AM10/14/21
to rMATS User Group
The paired model is known to be slow: https://github.com/Xinglab/rmats-turbo/issues/5

There were some changes to make sure that PAIRADISE uses the number of threads specified on the command line:
https://github.com/Xinglab/rmats-turbo/pull/32
https://github.com/Xinglab/rmats-turbo/commit/5de46978a45654e05541920457520c794672a93f
https://github.com/Xinglab/PAIRADISE/pull/3

If you are using rMATS v4.1.1 and the latest PAIRADISE from github then you should have the fixes. You can check how many threads PAIRADISE is using in the --tmp directory in JC_SE/rMATS_result_paired.txt. There should be a line like:
Preparing 10 clusters for parallel processing....

Eric

anbar...@gmail.com

unread,
Oct 18, 2021, 8:08:49 AM10/18/21
to rMATS User Group
Thanks a lot, Eric.

I am using the rMATS turbo v4.1.1.  Yes, there is a line indicating 10 clusters for parallel processing.

When I look at the paradise_status.txt, it is still at 'ExonID 125737.[48% completed]' for the JC_SE. This is after 6 days of running the analysis. Is this normal?  Increasing the nthread to a large number (say 50) will help?

Thanks again,
Anbarasu

kutsc...@gmail.com

unread,
Oct 18, 2021, 9:56:42 AM10/18/21
to rMATS User Group
Other users have reported similar slowness when using the paired model so the behavior you are seeing is expected

If you have available CPUs, then yes I would expect increasing nthread to 50 would speed things up to about 5x compared to nthread 10

The paired model processes all of the detected events. If you used --novelSS then it might be that a very large number of novel events were detected which is contributing to the long processing time. You can check the number of lines in fromGTF.novelSpliceSite.SE.txt to see how many events were detected because of --novelSS. You could try running without --novelSS to reduce the runtime if you are ok with leaving out those novel splice site events. You could also try running without --paired-stats so that the default statistical model is used which is much faster

Eric

anbar...@gmail.com

unread,
Nov 16, 2021, 7:48:56 AM11/16/21
to rMATS User Group
Hi Eric,

I have been trying to run the paired stat model for 37-pairs. Now, I am able to get some results for A5SS and MXE.  For the SE event, the paradise_staus shows 100% completed, but for some reason, it stopped creating the rMATS_result_FDR.txt. When I looked at the error file, it seems that it loads up all r packages, creates the cluster and starts the analysis. Then,

Error in { : task 24857 failed - "missing value where TRUE/FALSE needed"

Calls: <Anonymous> -> %dopar% -> <Anonymous>

Execution halted


I get the same error message for the RI event. Could you please help us to fix this issue?

I am using the following command:

$pdir/python $rdir/rmats.py --b1 $bdir/aa.txt --b2 $bdir/bb.txt --gtf $dgtf/Homo_sapiens.GRCh38.104.gtf -t paired --readLen 101 --paired-stats --nthread 10 --od $out/all --tmp $out/all/tmpout

Many thanks,
Anbarasu

kutsc...@gmail.com

unread,
Nov 18, 2021, 1:43:39 PM11/18/21
to rMATS User Group
The paired model does not give very good error messages, but my guess is that it's related to this line and commit:
https://github.com/Xinglab/PAIRADISE/blob/c3334ffa593dbf26bc9428d46fd8406d190adcf9/pairadise/src/pairadise_model/R/pairadise.R#L338
https://github.com/Xinglab/PAIRADISE/commit/74072346c4ab8fde00d0f5eac0727317333ab324

I'm not sure of the details, but I think that somehow the calculation can end up with an Infinite number that leads to NA. You could try running with the change in that commit which should output a p-value of 1 in cases where it gets that error

You could also try running with the code modified to get more information. The message indicates that there was an unexpected NA value somewhere in the %dopar% block: https://github.com/Xinglab/PAIRADISE/blob/c3334ffa593dbf26bc9428d46fd8406d190adcf9/pairadise/src/pairadise_model/R/pairadise.R#L129

It doesn't say what line, but I think that task 24857 means the error happened for iExon=24857. If you print out the ExonID then you can find the row in the input JC.raw.input.SE.txt file with the same ID. Hopefully the error can be reproduced with just that one row

The code can also be updated to be single threaded by using a regular for loop instead of %dopar%. In that case it should give a more detailed error message. I made a branch on the PAIRADISE github that has that change: https://github.com/Xinglab/PAIRADISE/compare/kutscherae-debug-error

If you download that branch, you can update your PAIRADISE install and hopefully get more information about the error:

git clone https://github.com/Xinglab/PAIRADISE.git
cd PAIRADISE
git checkout kutscherae-debug-error
R
> install.packages("./pairadise/src/pairadise_model/", repos=NULL)

Then run with the original JC.raw.input.SE.txt file to get the ExonID that goes with task 24857:
Rscript /path/to/rmats/rMATS_R/paired_model.R /path/to/original/JC.raw.input.SE.txt 1 /path/to/write/debug/output/rMATS_result_FDR.txt

That should print the ExonID at the beginning and you can CTRL-C it after that. Then create a new JC.raw.input.SE.txt file that is just the header and the line from the original with the matching ExonID and run:
Rscript /path/to/rmats/rMATS_R/paired_model.R /path/to/modified/JC.raw.input.SE.txt 1 /path/to/write/debug/output/rMATS_result_FDR.txt

Hopefully that should give a more detailed error message

Eric
Reply all
Reply to author
Forward
0 new messages