Issues with new installation and testing of rMATS 4.3.0

241 views
Skip to first unread message

Shreya Nair

unread,
May 1, 2024, 3:02:00 PM5/1/24
to rMATS User Group
Hello, 

I am trying to install and run rMATS on 
I've been able to download and install rMATS on Ubuntu 22.04 LTS, but I am running into this error when trying to run the ./test_rmats script : 
Screenshot 2024-05-01 at 1.54.07 PM.png
So I skipped that and decided to try with the testData provided, however there were a couple issues here as well : 
1. The testData folder does not contain .gtf files required for running the command python rmats.py, to work around this I pulled the test data from the release of rMATS 3.2.5
2. The STAR binary indices are not provided for running the test data, and don't seem to be downloadable anymore, where could I get them ? I have checked links from the rMATS v4.3.0 and rMATS v3.2.5 documentations. 
Screenshot 2024-05-01 at 1.50.54 PM.png

Appreciate any advice

- new rMATS user 

kutsc...@gmail.com

unread,
May 2, 2024, 12:55:51 PM5/2/24
to rMATS User Group
I'm not able to see the screenshots so I'm not sure about the error messages

The gtf for the test data is available at: https://sourceforge.net/projects/rnaseq-mats/files/MATS/Homo_sapiens.GRCh37.75.tgz/download

It does look like the STAR index download link is broken. Instead you can create your own index. Here are commands that I ran to install and test rMATS:

# install and test
git clone https://github.com/Xinglab/rmats-turbo.git
cd rmats-turbo
./build_rmats --conda
./test_rmats

# download and run testData with --b1 --b2
curl -L 'https://sourceforge.net/projects/rnaseq-mats/files/MATS/testData.tgz/download' -o testData.tgz
tar -xvf testData.tgz

curl -L 'https://sourceforge.net/projects/rnaseq-mats/files/MATS/Homo_sapiens.GRCh37.75.tgz/download' -o Homo_sapiens.GRCh37.75.tgz
tar -xvf Homo_sapiens.GRCh37.75.tgz
mv Homo_sapiens.GRCh37.75/Homo_sapiens.GRCh37.75.gtf ./

python ./rmats.py --b1 ./b1.txt --b2 ./b2.txt --gtf ./Homo_sapiens.GRCh37.75.gtf --readLength 50 --od ./test_data_bam_out --tmp ./test_data_bam_tmp

# create STAR index and run testData with --s1 --s2 (requires about 40GB of memory)
curl -L 'https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_45/gencode.v45.primary_assembly.annotation.gtf.gz' -O
gunzip gencode.v45.primary_assembly.annotation.gtf.gz

curl -L 'https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_45/GRCh38.primary_assembly.genome.fa.gz' -O
gunzip GRCh38.primary_assembly.genome.fa.gz

STAR --runThreadN 4 --runMode genomeGenerate --genomeDir gencode_45_star_index --genomeFastaFiles ./GRCh38.primary_assembly.genome.fa --sjdbGTFfile ./gencode.v45.primary_assembly.annotation.gtf

python ./rmats.py --s1 ./s1.txt --s2 ./s2.txt --bi ./gencode_45_star_index  --gtf ./gencode.v45.primary_assembly.annotation.gtf --readLength 50 --od ./test_data_fastq_out --tmp ./test_data_fastq_tmp

Eric

Christopher Jun

unread,
Jun 7, 2024, 11:38:20 AM6/7/24
to rMATS User Group

Hi Eric,

I am a new user of rMATS and I am also having trouble with the building and testing of rmats-turbo. I have followed the instructions you posted here by cloning the github repository on ubuntu, but I am receiving an error message for both ./build_rmats --conda and ./test_rmats, as well as running ./run_rmats.


CondaError: Run 'conda init' before 'conda activate'

I am able to activate the conda environment that was created by ./build_rmats --conda, but I keep receiving this conda error.

Help with this would be greatly appreciated. I am quite new to linux/ubuntu.

Thank you,
Christopher

kutsc...@gmail.com

unread,
Jun 10, 2024, 9:16:16 AM6/10/24
to rMATS User Group
build_rmats, test_rmats, and run_rmats all source setup_environment.sh which attempts to get conda working by sourcing your ~/.bashrc: https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/setup_environment.sh#L20

The error (CondaError: Run 'conda init' before 'conda activate') seems to be saying that the setup code for conda wasn't found. Since you are able to activate the conda environment then you have conda working in at least some situations. You could try running:

conda init bash

which should write the setup code for conda to your ~/.bashrc like the rmats scripts expect

There could also be some other code in your ~/.bashrc which causes things to work when you run conda commands interactively in your terminal, but not when the scripts source ~/.bashrc. You could try taking the lines that conda wrote to your ~/.bashrc and copying them to the setup_environment.sh file that the rmats scripts use

Eric

Christopher Jun

unread,
Jun 10, 2024, 2:08:58 PM6/10/24
to rMATS User Group
Thank you for your reply Eric.

test_rmats is now running, but now I am failing all the automated tests that are being run.

I have tried replacing the line in the setup_environment.sh file with the lines in my ~/.bashrc that were written during the conda installation (see below).

__conda_setup="$('/home/chrisjun/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/home/chrisjun/anaconda3/etc/profile.d/conda.sh" ]; then
        . "/home/chrisjun/anaconda3/etc/profile.d/conda.sh"
    else
        export PATH="/home/chrisjun/anaconda3/bin:$PATH"
    fi
fi
unset __conda_setup

I have also tried just replacing line 21 in set_environment.sh with source /home/chrisjun/anaconda3/etc/profile.d/conda.sh

Both are still resulting in failing the automated tests.

I see the following error messages (screenshot below).

Screenshot 2024-06-10 130813.pngf If this adds context, I am running an ubuntu terminal on a windows computer via windows subsystem for linux.

Best,
Christopher

Christopher Jun

unread,
Jun 10, 2024, 3:18:47 PM6/10/24
to rMATS User Group
I am also receiving this error when building rmats.

Screenshot 2024-06-10 141730.png


Thanks,
Christopher
On Monday, June 10, 2024 at 8:16:16 AM UTC-5 kutsc...@gmail.com wrote:

kutsc...@gmail.com

unread,
Jun 11, 2024, 8:50:17 AM6/11/24
to rMATS User Group
The error (No module named 'rmatspipeline') happens when the code can't find the compiled rmats module. The build should create a file like rmatspipeline*.so which will be used when rmats is run. That error makes sense since you're getting an error when building rmats and that build error would prevent the build from creating the rmatspipeline.so file

The error is (/home/chrisjun/anaconda3/envs/rmatstest/bin/gcc is not a full path to an existing compiler tool)

I think the build would try that gcc path because of environment variables. You might have set the CC environment variable which you could check with the command: echo $CC

build_rmats will set the CC environment variable if it's not already set: https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/build_rmats#L76

If you run ./build_rmats --conda then it should install the necessary compilers to conda_envs/rmats based on: https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/python_conda_requirements.txt

From the screenshot it looks like it's using a different environment (anaconda3/envs/rmatstest). You could try deactivating any conda environments and running the build from a new copy of the source code. Or if you have a working gcc you could try running (export CC=/path/to/your/gcc) before running the build

Eric

Changhe Ji

unread,
Sep 25, 2024, 2:58:18 PM9/25/24
to rMATS User Group
same problem 
 python3 ./rmats.py --s1 ./s1.txt --s2 ./s2.txt --bi ./gencode_45_star_index  --gtf ./gencode.v45.primary_assembly.annotation.gtf --readLength 50 --od ./test_data_fastq_out --tmp ./test_data_fastq_tmp
Traceback (most recent call last):
  File "./rmats.py", line 19, in <module>
    from rmatspipeline import run_pipe
ModuleNotFoundError: No module named 'rmatspipeline'

Changhe Ji

unread,
Sep 25, 2024, 6:13:27 PM9/25/24
to rMATS User Group
i meet the problem when i run the test data which you list below
# install and test
git clone https://github.com/Xinglab/rmats-turbo.git
cd rmats-turbo
./build_rmats --conda
./test_rmats

# download and run testData with --b1 --b2
curl -L 'https://sourceforge.net/projects/rnaseq-mats/files/MATS/testData.tgz/download' -o testData.tgz
tar -xvf testData.tgz

curl -L 'https://sourceforge.net/projects/rnaseq-mats/files/MATS/Homo_sapiens.GRCh37.75.tgz/download' -o Homo_sapiens.GRCh37.75.tgz
tar -xvf Homo_sapiens.GRCh37.75.tgz
mv Homo_sapiens.GRCh37.75/Homo_sapiens.GRCh37.75.gtf ./

python ./rmats.py --b1 ./b1.txt --b2 ./b2.txt --gtf ./Homo_sapiens.GRCh37.75.gtf --readLength 50 --od ./test_data_bam_out --tmp ./test_data_bam_tmp

# create STAR index and run testData with --s1 --s2 (requires about 40GB of memory)
curl -L 'https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_45/gencode.v45.primary_assembly.annotation.gtf.gz' -O
gunzip gencode.v45.primary_assembly.annotation.gtf.gz

curl -L 'https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_45/GRCh38.primary_assembly.genome.fa.gz' -O
gunzip GRCh38.primary_assembly.genome.fa.gz

STAR --runThreadN 4 --runMode genomeGenerate --genomeDir gencode_45_star_index --genomeFastaFiles ./GRCh38.primary_assembly.genome.fa --sjdbGTFfile ./gencode.v45.primary_assembly.annotation.gtf

python ./rmats.py --s1 ./s1.txt --s2 ./s2.txt --bi ./gencode_45_star_index  --gtf ./gencode.v45.primary_assembly.annotation.gtf --readLength 50 --od ./test_data_fastq_out --tmp ./test_data_fastq_tmp

Screenshot 2024-09-25 181126.png

kutsc...@gmail.com

unread,
Sep 27, 2024, 8:41:24 AM9/27/24
to rMATS User Group
From the screenshot it looks like the main error was:
/usr/bin/STAR: line 7: 4082 Killed "${cmd}" "$@"

That could happen if the command exceeded available memory

Eric

Changhe Ji

unread,
Sep 27, 2024, 1:19:49 PM9/27/24
to kutsc...@gmail.com, rMATS User Group
Thank you. Do you have the full code from the test data for the sashimi plot analysis?

--
You received this message because you are subscribed to the Google Groups "rMATS User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rmats-user-gro...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rmats-user-group/00bdef33-7e02-48cb-bb4c-9aa9bc85e232n%40googlegroups.com.

Changhe Ji

unread,
Sep 27, 2024, 4:10:25 PM9/27/24
to kutsc...@gmail.com, rMATS User Group
Screenshot 2024-09-27 160801.png Hi, if Sep 27 12:42:47 ... writing SAindex to disk
Sep 27 12:43:26 ..... finished successfully
what is the next step i can do?

On Fri, 27 Sept 2024 at 08:41, kutsc...@gmail.com <kutsc...@gmail.com> wrote:

kutsc...@gmail.com

unread,
Oct 1, 2024, 1:23:14 PM10/1/24
to rMATS User Group
The command that finished was STAR --runMode genomeGenerate

The next command is running rMATS using the genome files output from STAR. That rMATS command starts with: python ./rmats.py ...

The error from your screenshot is: Command 'python' not found

rMATS requires python to be installed in order to run. Based on the error message it looks like you may have python installed as python3. You could try replacing python with python3 in your command

Eric

Changhe Ji

unread,
Oct 1, 2024, 2:00:35 PM10/1/24
to kutsc...@gmail.com, rMATS User Group
 python3 ./rmats.py --s1 ./s1.txt --s2 ./s2.txt --bi ./gencode_45_star_index  --gtf ./gencode.v45.primary_assembly.annotation.gtf --readLength 50 --od ./test_data_fastq_out --tmp ./test_data_fastq_tmp
Traceback (most recent call last):
  File "./rmats.py", line 19, in <module>
    from rmatspipeline import run_pipe
ModuleNotFoundError: No module named 'rmatspipeline'

Changhe Ji

unread,
Dec 21, 2024, 1:52:25 PM12/21/24
to rMATS User Group
The test data bam and fastq data get different mis-splicing result, is it correct?
bam file 

231ESRP.25K.rep-1.bam 
231ESRP.25K.rep-2.bam       

231EV.25K.rep-1.bam 
231EV.25K.rep-2.bam


EventType       EventTypeDescription    TotalEventsJC   TotalEventsJCEC SignificantEventsJC     SigEventsJCSample1HigherInclusion       SigEventsJCSample2HigherInclusion       SignificantEventsJCEC   SigEventsJCECSample1HigherInclusion     SigEventsJCECSample2HigherInclusion
SE      skipped exon    3       4       0       0       0       1       1       0
A5SS    alternative 5' splice sites     0       0       0       0       0       0       0       0
A3SS    alternative 3' splice sites     1       1       0       0       0       0       0       0
MXE     mutually exclusive exons        0       1       0       0       0       0       0       0
RI      retained intron 1       2       0       0       0       0       0       0


and fastq file 

231ESRP.25K.rep-1.R1.fastq             
231ESRP.25K.rep-1.R2.fastq            
231ESRP.25K.rep-2.R1.fastq  
231ESRP.25K.rep-2.R2.fastq
  
231EV.25K.rep-1.R1.fastq    
231EV.25K.rep-1.R2.fastq   
231EV.25K.rep-2.R1.fastq
231EV.25K.rep-2.R2.fastq

EventType       EventTypeDescription    TotalEventsJC   TotalEventsJCEC SignificantEventsJC     SigEventsJCSample1HigherInclusion       SigEventsJCSample2HigherInclusion       SignificantEventsJCEC   SigEventsJCECSample1HigherInclusion     SigEventsJCECSample2HigherInclusion
SE      skipped exon    94      139     3       1       2       5       2       3
A5SS    alternative 5' splice sites     32      38      2       1       1       2       1       1
A3SS    alternative 3' splice sites     33      45      1       0       1       1       0       1
MXE     mutually exclusive exons        37      72      1       1       0       1       1       0
RI      retained intron 53      118     0       0       0       2       1       1

kutsc...@gmail.com

unread,
Dec 23, 2024, 3:15:55 PM12/23/24
to rMATS User Group
Yes, that is the expected output for the summary files when running the test data as described in this post. The main reason for the different results is the reference files. The test data bam files were aligned using an older ensembl reference (Homo_sapiens.GRCh37.75.gtf). The instructions for running with the test fastq files use a gencode reference (gencode.v45.primary_assembly.annotation.gtf)

Eric

Changhe Ji

unread,
Dec 23, 2024, 11:03:22 PM12/23/24
to kutsc...@gmail.com, rMATS User Group
ubuntu (preview) to run test bam file get 
EventType       EventTypeDescription    TotalEventsJC   TotalEventsJCEC SignificantEventsJC     SigEventsJCSample1HigherInclusion       SigEventsJCSample2HigherInclusion       SignificantEventsJCEC   SigEventsJCECSample1HigherInclusion     SigEventsJCECSample2HigherInclusion
SE      skipped exon    3       4       0       0       0       1       1       0
A5SS    alternative 5' splice sites     0       0       0       0       0       0       0       0
A3SS    alternative 3' splice sites     1       1       0       0       0       0       0       0
MXE     mutually exclusive exons        0       1       0       0       0       0       0       0
RI      retained intron 1       2       0       0       0       0       0       0


ubuntu (22.04.5 LTS) to run test bam file get 
EventType       EventTypeDescription    TotalEventsJC   TotalEventsJCEC SignificantEventsJC     SigEventsJCSample1HigherInclusion     SigEventsJCSample2HigherInclusion       SignificantEventsJCEC   SigEventsJCECSample1HigherInclusion   SigEventsJCECSample2HigherInclusion
SE      skipped exon    0       0       0       0       0       0       0       0

A5SS    alternative 5' splice sites     0       0       0       0       0       0       0       0
A3SS    alternative 3' splice sites     0       0       0       0       0       0       0       0
MXE     mutually exclusive exons        0       0       0       0       0       0       0       0
RI      retained intron 0       0       0       0       0       0       0       0
I do not know why.



--
You received this message because you are subscribed to the Google Groups "rMATS User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rmats-user-gro...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages