No significant SE events

293 views
Skip to first unread message

Jan Mauer

unread,
Jan 7, 2021, 10:26:30 AM1/7/21
to rMATS User Group
Dear all,

I'm using rMATS to detect alternative splicing upon drug treatment.
I want to compare splicing changes in vehicle-treated vs increasing drug doses (75, 250 and 500 nM drug).
For this, I'm performing three rMATS runs where I compare each of the drug doses to vehicle.

I'm running rMATS 4.1.0 with the following parameters:
python rmats.py --b1 b1.txt --b2 b2.txt  --gtf gencode.v31lift37.annotation.gtf --od out --tmp tmp -t paired --readLength 150 --cstat 0.0001 --libType fr-secondstrand --nthread 30 --novelSS

The number of significantly different splicing events in the output files are:
75 nM vs vehicle:
A3SS.......... 160,803 events
A5SS.......... 111,933 events
SE.......... 348,737 events
RI.......... 43,521 events
MXE.......... 170,759 events
250 nM vs vehicle:
A3SS.......... 177,807 events
A5SS.......... 120,127 events
SE.......... 394,799 events
RI.......... 47,434 events
MXE.......... 212,819 events
500 nM vs vehicle:
A3SS.......... 252,998 events
A5SS.......... 164,492 events
SE.......... 0 events
RI.......... 60,115 events
MXE.......... 488,539 events

Note that the number of splice events in each category increases with increasing drug concentration, except for SE events, where rMATS does not detect any events for the 500 nM drug treatment.
To make sure that the lack of SE events is not dependent on the cutoff, I have also changed cstat to 0.01 and 0.1, but still no SE events detected.
I have also rMATS-compared 500 nM vs. 75 and 250 nM, still with no SE events detected.
Running rMATS directly on original fastq files leads to the same result.
Other tools such as salmon or miso give good results with this data set.
Thus I don't believe that there is an issue with the raw data.

However, during the "processing of the count files" step, I get the following error 
==========
Done processing each gene from dictionary to compile AS events
Found 700244 exon skipping events
Found 578900 exon MX events
Found 466469 alt SS events
There are 281872 alt 3 SS events and 184597 alt 5 SS events.
Found 64447 RI events
==========
ase: 69.2721428871
count: 583.429327965
Processing count files.
Traceback (most recent call last):
  File "/home/mauer/software/rmats-turbo/rMATS_P/FDR.py", line 45, in <module>
    ifile=open(sys.argv[1]);title=ifile.readline();
IOError: [Errno 2] No such file or directory: 'tmp/JC_SE/rMATS_result_P-V.txt'
paste: tmp_500.bam/JC_SE/rMATS_result_FDR.txt: No such file or directory
Traceback (most recent call last):
  File "/home/mauer/software/rmats-turbo/rMATS_P/FDR.py", line 45, in <module>
    ifile=open(sys.argv[1]);title=ifile.readline();
IOError: [Errno 2] No such file or directory: 'tmp/JCEC_SE/rMATS_result_P-V.txt'
paste: tmp_500.bam/JCEC_SE/rMATS_result_FDR.txt: No such file or directory

The run finishes without any additional errors.

So rMATS does not generate the rMATS_result_P-V.txt file for SE events and I was wondering what the reason for this could be. I find it highly unlikely that there are really no SE events in the highest drug dose as in all my previous rMATS run, I always got numerous SE events, even when comparing much more similar conditions than this one.

Any feedback or speculation on why this happens would be very much appreciated.

Thanks, Jan

kutsc...@gmail.com

unread,
Jan 7, 2021, 11:45:13 AM1/7/21
to rMATS User Group
The rMATS_result_P-V.txt file from the error should have been created by this line: https://github.com/Xinglab/rmats-turbo/blob/bc29dc5202f9e18aed99fe7dfe6f95ad41579223/rmats.py#L357 which is roughly this command:

rmats-turbo/rMATS_C/rMATSexe -i ./out/JC.raw.input.SE.txt -t 1 -o ./tmp/JC_SE/rMATS_result_P-V.txt -c 0.0001

There may have been an error in that command, but the output is sent to /dev/null so it doesn't show up in the output you posted. You could try running that command yourself to see if an error message is output. You could also try editing that line of rmats.py and removing ", stdout=FNULL". Then you could re-run rmats and hopefully see more details about the error in the output. I think that rMATSexe prints quite a lot of output if it works correctly which is why the output is sent to /dev/null. You may want to redirect the output to a file to make it easier to search the output

Eric

jan....@gmail.com

unread,
Jan 8, 2021, 9:04:46 AM1/8/21
to rMATS User Group
Hi Eric,

thanks for the reply.
When I remove stdout=FNULL from line 357 I get only one additional line before the error:
gtf: 29.49269104
There are 62292 distinct gene ID in the gtf file
There are 228521 distinct transcript ID in the gtf file
There are 38371 one-transcript genes in the gtf file
There are 1367390 exons in the gtf file
There are 26667 one-exon transcripts in the gtf file
There are 23972 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 3.668545
Average number of exons per transcript is 5.983651
Average number of exons per transcript excluding one-exon tx is 6.642043
Average number of gene per geneGroup is 8.442842
statistic: 0.0365138053894
novel: 2419.12513304
The splicing graph and candidate read have been saved into tmp_500.bam/2021-01-08-14:22:54_650599.rmats
save: 484.030689955
loadsg: 3.75320911407

==========
Done processing each gene from dictionary to compile AS events
Found 700244 exon skipping events
Found 578900 exon MX events
Found 466469 alt SS events
There are 281872 alt 3 SS events and 184597 alt 5 SS events.
Found 64447 RI events
==========

ase: 69.8922250271
count: 579.637374878
Processing count files.
number of thread=1; input file=out_500.bam/JC.raw.input.SE.txt; output folder=tmp_500.bam/JC_SE/rMATS_result_P-V.txt; cutoff=0.0001;
Traceback (most recent call last):
  File "/home/mauer/software/rmats-turbo/rMATS_P/FDR.py", line 45, in <module>
    ifile=open(sys.argv[1]);title=ifile.readline();
IOError: [Errno 2] No such file or directory: 'tmp_500.bam/JC_SE/rMATS_result_P-V.txt'
paste: tmp_500.bam/JC_SE/rMATS_result_FDR.txt: No such file or directory
number of thread=1; input file=out_500.bam/JCEC.raw.input.SE.txt; output folder=tmp_500.bam/JCEC_SE/rMATS_result_P-V.txt; cutoff=0.0001;
Traceback (most recent call last):
  File "/home/mauer/software/rmats-turbo/rMATS_P/FDR.py", line 45, in <module>
    ifile=open(sys.argv[1]);title=ifile.readline();
IOError: [Errno 2] No such file or directory: 'tmp_500.bam/JCEC_SE/rMATS_result_P-V.txt'
paste: tmp_500.bam/JCEC_SE/rMATS_result_FDR.txt: No such file or directory
number of thread=1; input file=out_500.bam/JC.raw.input.MXE.txt; output folder=tmp_500.bam/JC_MXE/rMATS_result_P-V.txt; cutoff=0.0001;
Testing 1
3 return from: 1
Testing 2
3 return from: 2
Testing 3
3 return from: 3
Testing 4
3 return from: 4
Testing 5
3 return from: 5
Testing 6
3 return from: 6
etc.

I guess the processing continues with the MXE events.
Wen I direct the stdout to file with stdout=open('log1.txt', 'a') there is no information about the SE event error in there.

Maybe I'm doing something wrong?

Thanks, Jan

jan....@gmail.com

unread,
Jan 8, 2021, 9:15:04 AM1/8/21
to rMATS User Group
As far as I can see, there is also nothing wrong with the raw.input files.
Number of lines of the raw files (wc -l):
281873 fromGTF.A3SS.txt
   184598 fromGTF.A5SS.txt
   578901 fromGTF.MXE.txt
    10098 fromGTF.novelJunction.A3SS.txt
     6965 fromGTF.novelJunction.A5SS.txt
    52953 fromGTF.novelJunction.MXE.txt
     2386 fromGTF.novelJunction.RI.txt
   199974 fromGTF.novelJunction.SE.txt
   266556 fromGTF.novelSpliceSite.A3SS.txt
   174041 fromGTF.novelSpliceSite.A5SS.txt
   522677 fromGTF.novelSpliceSite.MXE.txt
    57022 fromGTF.novelSpliceSite.RI.txt
   470607 fromGTF.novelSpliceSite.SE.txt
    64448 fromGTF.RI.txt
   700245 fromGTF.SE.txt
   281873 JCEC.raw.input.A3SS.txt
   184598 JCEC.raw.input.A5SS.txt
   578901 JCEC.raw.input.MXE.txt
    64448 JCEC.raw.input.RI.txt
   633689 JCEC.raw.input.SE.txt
   281873 JC.raw.input.A3SS.txt
   184598 JC.raw.input.A5SS.txt
   501743 JC.raw.input.MXE.txt
    64448 JC.raw.input.RI.txt
   628778 JC.raw.input.SE.txt

head JC.raw.input.SE.txt
ID IJC_SAMPLE_1 SJC_SAMPLE_1 IJC_SAMPLE_2 SJC_SAMPLE_2 IncFormLen SkipFormLen
1 156,76,74 2,0,2 66,53,76 4,1,0 247 149
2 222,123,99 54,44,36 321,201,285 17,17,13 298 149
3 10,4,6 630,460,471 7,1,4 415,317,347 298 149
4 15,9,15 98,42,31 12,1,6 22,13,5 298 149
5 18,6,17 630,460,471 7,2,8 415,317,347 298 149
6 0,0,0 105,95,72 6,0,3 62,30,40 196 149
7 82,69,44 0,2,0 33,14,13 4,6,10 208 149
8 19,12,19 29,10,4 15,10,7 44,33,27 208 149
9 39,30,15 72,71,33 45,28,49 32,10,24 174 149

tail JC.raw.input.SE.txt
700232 3911,2320,2015 0,0,0 2026,1540,1689 0,2,0 278 149
700233 3912,2320,2015 0,0,0 2024,1540,1689 2,0,0 278 149
700234 144,147,89 84,85,43 20,10,9 36,24,31 298 149
700235 2486,1144,994 213,137,166 1421,840,1150 610,455,469 298 149
700236 27,19,16 0,10,0 28,19,19 0,3,0 298 149
700238 9,0,0 213,137,166 0,0,0 610,455,469 262 149
700239 3752,2235,1968 213,137,166 1963,1486,1630 610,455,469 271 149
700240 0,0,0 213,137,166 0,2,0 610,455,469 298 149
700242 2,0,0 8,0,1 0,0,0 0,0,3 191 149
700243 12,6,3 8,0,1 1,0,0 0,0,3 270 149

kutsc...@gmail.com

unread,
Jan 8, 2021, 11:19:10 AM1/8/21
to rMATS User Group
I ran a debug version in gdb against a large input file with test data, and it looks like the issue is that rMATSexe is allocating some data on the stack for each row and exceeding the stack size limit

https://github.com/Xinglab/rmats-turbo/blob/v4.1.0/rMATS_C/src/main.c#L73
It creates two pointer arrays with row_num elements each (odiff *datum[row_num] and double *prob[row_num]). The pointers are 8 bytes each and there are 700243 rows giving a total of 700243*8*2 = 11203888 bytes or about 10942 KB

"ulimit -s" shows the stack size limit which for me was initially set to 8192 KB. I changed it to 12000 KB with "ulimit -s 12000" and rMATSexe is running now without an error (although it's taking a long time since the input is large)

For now, you may be able to work around this issue by setting the stack limit with "ulimit -s {value}". rMATSexe should not be allocating that data on the stack and I'll work on fixing that for a future release

Eric

jan....@gmail.com

unread,
Jan 9, 2021, 3:55:00 AM1/9/21
to rMATS User Group
Dear Eric,

increasing the stack size limit fixed the problem.
It indeed runs very long now but the data output is there.

Thanks a bunch for investing your time and helping me out with this issue.

Best, Jan

Reply all
Reply to author
Forward
0 new messages