Segmentation fault rMATS 4.0.1 with BAM files

Martin Stražar

unread,

Jan 16, 2018, 3:03:07 PM1/16/18

to rMATS User Group

Dear authors of rMATS,

I am having trouble running the software with the new turbo version (>4). Previous versions worked fine for me.

I get a segfault with no additional clues how to debug. Would somebody have time to point me to possible problems and what is the correct setup?

I am attaching the input data and other details below.

Side question: In paired-end sequencing, does readLength refer to the sum of the lengths of both read mates (200) or a sinlge read (100)?

Many thanks,

Martin Stražar

# Python 2 environment with numpy (and required system libraries BLAS, etc.)
python rMATS.4.0.1/rMATS-turbo-Linux-UCS4/rmats.py --b1 b1.txt --b2 b2.txt --gtf Homo_sapiens.GRCh37.75.HepG2_mean_1000.gtf --od output/ -t paired --nthread 20 --readLength 200

Here is the output:

There are 1000 distinct gene ID in the gtf file
There are 10979 distinct transcript ID in the gtf file
There are 52 one-transcript genes in the gtf file
There are 73962 exons in the gtf file
There are 111 one-exon transcripts in the gtf file
There are 28 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 10.979000
Average number of exons per transcript is 6.736679
Average number of exons per transcript excluding one-exon tx is 6.795271
Average number of gene per geneGroup is 1.116042
Segmentation fault (core dumped)

The server has 24 cores:

Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz

and 512 GB of RAM

and Ubuntu 14.04.4 LTS

Input Data

Please note that all the files are not within the same directories on our system.

GTF:

https://drive.google.com/file/d/14I5VMHdrkqcQnnzA-Fsj5ONLI7fVLJNB/view?usp=sharing

The BAM files are available here and the configuration files are found below.

https://www.encodeproject.org/files/ENCFF438LVA/@@download/ENCFF438LVA.bam

https://www.encodeproject.org/files/ENCFF260JZM/@@download/ENCFF260JZM.bam

https://www.encodeproject.org/files/ENCFF645UHW/@@download/ENCFF645UHW.bam

https://www.encodeproject.org/files/ENCFF295SFA/@@download/ENCFF295SFA.bam

b1.txt:

ENCFF438LVA.bam,ENCFF260JZM.bam

b2.txt:

ENCFF645UHW.bam,ENCFF295SFA.bam

Adam Cornwell

unread,

Feb 6, 2018, 12:11:17 PM2/6/18

to rMATS User Group

Hi, I wanted to mention I'm seeing similar behavior on OS X (10.13.3) when attempting to run on the small example dataset bamfiles:

python rmats.py --b1 b1a.txt --b2 b2a.txt -t paired --readLength 101 --gtf testData/test.gtf --nthread 2 --libType fr-unstranded --od ./example_output/ --cstat 0.001
There are 60 distinct gene ID in the gtf file
There are 869 distinct transcript ID in the gtf file
There are 0 one-transcript genes in the gtf file
There are 5885 exons in the gtf file
There are 2 one-exon transcripts in the gtf file
There are 0 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 14.483333
Average number of exons per transcript is 6.772152
Average number of exons per transcript excluding one-exon tx is 6.785467
Average number of gene per geneGroup is 1.000000
Segmentation fault: 11

I also got a pop-up with the following error:

"Python quit unexpectedly while using the libstdc++.6.dylib plug-in."

I also attempted to run 4.0.1 on our compute cluster (RedHat Enterprise Linux) and got it to run on our own data seemingly successfully (no errors reported, output files get created) but I suspected that the statistical analysis component wasn't running right, as I wasn't getting any significant results written even at very high stat cutoffs. I think went back and tried to run the small example dataset provided and got similar behavior where it appears to die somewhere in the statistical analysis- here's the stderr output:

*** Error in `python': double free or corruption (!prev): 0x00000000018bc570 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7c619)[0x2b46e569e619]
/gpfs/fs1/home/acornwe2/rMATS/rMATS.4.0.1/rmatspipeline.so(_ZNSt10_HashtableIiSt4pairIKiSt3setISsSt4lessISsESaISsEEESaIS7_ESt10_Select1stIS7_ESt8equal_toIiESt4hashIiENSt8__detail18_Mod_range_hashingENSF_20_Default_ranged_hashENSF_20_Prime_rehash_policyELb0ELb0ELb1EE16_M_insert_bucketERKS7_mm+0x43f)[0x2b46f02d3d5f]
/gpfs/fs1/home/acornwe2/rMATS/rMATS.4.0.1/rmatspipeline.so(_ZNSt8__detail9_Map_baseIiSt4pairIKiSt3setISsSt4lessISsESaISsEEESt10_Select1stIS8_ELb1ESt10_HashtableIiS8_SaIS8_ESA_St8equal_toIiESt4hashIiENS_18_Mod_range_hashingENS_20_Default_ranged_hashENS_20_Prime_rehash_policyELb0ELb0ELb1EEEixERS2_+0xfd)[0x2b46f02d3fbd]
/gpfs/fs1/home/acornwe2/rMATS/rMATS.4.0.1/rmatspipeline.so(+0x6be74)[0x2b46f02aae74]
/gpfs/fs1/home/acornwe2/rMATS/rMATS.4.0.1/rmatspipeline.so(+0x6d6dc)[0x2b46f02ac6dc]
/gpfs/fs1/home/acornwe2/rMATS/rMATS.4.0.1/rmatspipeline.so(+0x6f06f)[0x2b46f02ae06f]
/gpfs/fs1/home/acornwe2/rMATS/rMATS.4.0.1/rmatspipeline.so(+0x73a76)[0x2b46f02b2a76]
/software/python/2.7.12/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x658e)[0x2b46e49e988e]
/software/python/2.7.12/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6282)[0x2b46e49e9582]
/software/python/2.7.12/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x2b46e49ec80d]
/software/python/2.7.12/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x2b46e49ec942]
/software/python/2.7.12/lib/libpython2.7.so.1.0(PyRun_FileExFlags+0x92)[0x2b46e4a15562]
/software/python/2.7.12/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xd9)[0x2b46e4a168f9]
/software/python/2.7.12/lib/libpython2.7.so.1.0(Py_Main+0xc4d)[0x2b46e4a2c55d]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b46e5643c05]
python[0x400721]

On the Linux cluster I had some trouble with building the shared library dependencies, libgsl etc, but eventually got all the requirements satisfied so far as I know (assuming that rMATS is not extremely version-sensitive as for example I think I built a newer version of libgsl.so and then symlinked it to libgsl.so.0.

Anyway, thanks for the help, really want to try to use the software so I hope we can resolve these issues.

Marcelo Pereira

unread,

Feb 2, 2019, 6:40:54 PM2/2/19

to rMATS User Group

Hi

I have the segmentation fault when I run rMATS on my fastqs, using drosophila melanogaster genome index. I tested my rMATS installation with the test dataset and it was ok.

Here is the log:

mapping the first sample

mapping sample_0, TBPH-RNAi_on_R1.fastq TBPH-RNAi_on_R2.fastq is done with status 0

Feb 02 22:58:53 ..... started STAR run

Feb 02 22:58:53 ..... loading genome

Feb 02 22:58:55 ..... processing annotations GTF

Feb 02 22:58:56 ..... inserting junctions into the genome indices

Feb 02 22:59:04 ..... started mapping

Feb 02 23:04:27 ..... started sorting BAM

Feb 02 23:05:04 ..... finished successfully

mapping sample_0, TBPH-RNAi_on_2_R1.fastq TBPH-RNAi_on_2_R2.fastq is done with status 0

Feb 02 23:05:06 ..... started STAR run

Feb 02 23:05:06 ..... loading genome

Feb 02 23:05:07 ..... processing annotations GTF

Feb 02 23:05:09 ..... inserting junctions into the genome indices

Feb 02 23:05:16 ..... started mapping

Feb 02 23:09:31 ..... started sorting BAM

Feb 02 23:09:56 ..... finished successfully

mapping sample_0, TBPH-RNAi_on_1_R1.fastq TBPH-RNAi_on_1_R2.fastq is done with status 0

Feb 02 23:09:58 ..... started STAR run

Feb 02 23:09:58 ..... loading genome

Feb 02 23:10:00 ..... processing annotations GTF

Feb 02 23:10:01 ..... inserting junctions into the genome indices

Feb 02 23:10:09 ..... started mapping

Feb 02 23:14:14 ..... started sorting BAM

Feb 02 23:14:49 ..... finished successfully

mapping the first sample

mapping sample_1, /data/working_directory/marcelo/KDs_all_samples/66bp_only_fastqs/e-RNAi_on_R1.fastq /data/working_directory/marcelo/KDs_all_samples/66bp_only_fastqs/e-RNAi_on_R2.fastq is done with status 0

Feb 02 23:14:51 ..... started STAR run

Feb 02 23:14:51 ..... loading genome

Feb 02 23:14:53 ..... processing annotations GTF

Feb 02 23:14:54 ..... inserting junctions into the genome indices

Feb 02 23:15:02 ..... started mapping

Feb 02 23:20:10 ..... started sorting BAM

Feb 02 23:20:47 ..... finished successfully

mapping sample_1, /data/working_directory/marcelo/KDs_all_samples/66bp_only_fastqs/e-RNAi_on_2_R1.fastq /data/working_directory/marcelo/KDs_all_samples/66bp_only_fastqs/e-RNAi_on_2_R2.fastq is done with status 0

Feb 02 23:20:48 ..... started STAR run

Feb 02 23:20:48 ..... loading genome

Feb 02 23:20:50 ..... processing annotations GTF

Feb 02 23:20:51 ..... inserting junctions into the genome indices

Feb 02 23:20:59 ..... started mapping

Feb 02 23:25:53 ..... started sorting BAM

Feb 02 23:26:33 ..... finished successfully

mapping sample_1, /data/working_directory/marcelo/KDs_all_samples/66bp_only_fastqs/e-RNAi_on_1_R1.fastq /data/working_directory/marcelo/KDs_all_samples/66bp_only_fastqs/e-RNAi_on_1_R2.fastq is done with status 0

Feb 02 23:26:35 ..... started STAR run

Feb 02 23:26:35 ..... loading genome

Feb 02 23:26:37 ..... processing annotations GTF

Feb 02 23:26:38 ..... inserting junctions into the genome indices

Feb 02 23:26:46 ..... started mapping

Feb 02 23:32:01 ..... started sorting BAM

Feb 02 23:32:39 ..... finished successfully

There are 17494 distinct gene ID in the gtf file

There are 34542 distinct transcript ID in the gtf file

There are 10105 one-transcript genes in the gtf file

There are 187251 exons in the gtf file

There are 5487 one-exon transcripts in the gtf file

There are 4246 one-transcript genes with only one exon in the transcript

Average number of transcripts per gene is 1.974506

Average number of exons per transcript is 5.420966

Average number of exons per transcript excluding one-exon tx is 6.255860

Average number of gene per geneGroup is 3.921088

Segmentation fault

Brian Joseph

unread,

Jun 17, 2019, 6:53:00 PM6/17/19

to rMATS User Group

Hi Martin,

I hope you were able to find a solution to your problem. If not, maybe my experience will provide some assistance. I was working on a cluster and rMATs.py was crashing with a Segmentation Fault. The only alteration that worked consistently was reducing nthreads to 1 (-nthread 1). Other than that, increasing the memory worked sporadically.

On Tuesday, January 16, 2018 at 3:03:07 PM UTC-5, Martin Stražar wrote:

Marina Yurieva

unread,

Jul 16, 2019, 8:46:02 PM7/16/19

to rMATS User Group

I noticed that if you have bad formatting in your gtf file, rMATS will give segfault.

On Tuesday, January 16, 2018 at 3:03:07 PM UTC-5, Martin Stražar wrote:

David D

unread,

Dec 9, 2019, 5:27:52 AM12/9/19

to rMATS User Group

Thanks! This was the simple solution for fixing my segfault!

Reply all

Reply to author

Forward