rmats output - not using reads from .bam files

206 views
Skip to first unread message

Marian Priebe

unread,
Nov 18, 2022, 11:50:34 AM11/18/22
to rMATS User Group
Hi there,
I have been working my way through using rmats on some of my data, I have gotten to the point that I think it is finding my various files correctly but seems to not be using any of the reads in the .bam files. My output looks as follows:

rmats.py --nthread 20 --b1 F_samples.txt --b2 M_samples.txt --gtf input/GCF_000188075.2_Si_gnH_genomic.gtf -t single --readLength 50 --od results --tmp tmp

gtf: 5.0943567752838135

There are 16334 distinct gene ID in the gtf file

There are 27497 distinct transcript ID in the gtf file

There are 11631 one-transcript genes in the gtf file

There are 205437 exons in the gtf file

There are 1711 one-exon transcripts in the gtf file

There are 1707 one-transcript genes with only one exon in the transcript

Average number of transcripts per gene is 1.683421

Average number of exons per transcript is 7.471251

Average number of exons per transcript excluding one-exon tx is 7.900644

Average number of gene per geneGroup is 37.408396

statistic: 0.0025742053985595703


read outcome totals across all BAMs

USED: 0

NOT_PAIRED: 0

NOT_NH_1: 12450438

NOT_EXPECTED_CIGAR: 679775

NOT_EXPECTED_READ_LENGTH: 62853210

NOT_EXPECTED_STRAND: 0

EXON_NOT_MATCHED_TO_ANNOTATION: 0

JUNCTION_NOT_MATCHED_TO_ANNOTATION: 0

CLIPPED: 17449587

total: 93433010

outcomes by BAM written to: tmp/2022-11-18-16_39_27_270626_read_outcomes_by_bam.txt


novel: 207.24952268600464

The splicing graph and candidate read have been saved into tmp/2022-11-18-16_39_27_270626_*.rmats

save: 0.0031414031982421875

Traceback (most recent call last):

  File "/data/home/btx452/.conda/envs/rmats/bin/rmats.py", line 595, in <module>

    main()

  File "/data/home/btx452/.conda/envs/rmats/bin/rmats.py", line 563, in main

    run_pipe(args)

  File "rmatspipeline/rmatspipeline.pyx", line 3860, in rmats.rmatspipeline.run_pipe

  File "rmatspipeline/rmatspipeline.pyx", line 3723, in rmats.rmatspipeline.split_sg_files_by_bam

  File "rmatspipeline/rmatspipeline.pyx", line 3731, in rmats.rmatspipeline.split_sg_files_by_bam

ValueError: invalid literal for int() with base 10: 'input/alignments/BRA-19FAligned.sortedByCoord.out.bam'


Is there any way soemone can help me figure out what is going on here?

Best wishes

Marian

kutsc...@gmail.com

unread,
Nov 18, 2022, 1:18:29 PM11/18/22
to rMATS User Group
From the read outcomes it looks like read length is the main reason reads are not being used:
NOT_NH_1: 12450438 (13.3%)
NOT_EXPECTED_CIGAR: 679775 (0.7%)
NOT_EXPECTED_READ_LENGTH: 62853210 (67.3%)
CLIPPED: 17449587 (18.7%)
total: 93433010 (100%)

You may want to change the value of --readLength or use --variable-read-length

The traceback is pointing to this line: https://github.com/Xinglab/rmats-turbo/blob/v4.1.2/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L3731

It's expecting to find the read length on the second line of each .rmats file, but it found: input/alignments/BRA-19FAligned.sortedByCoord.out.bam
I'm not sure why that would happen. Maybe looking at the first few lines of the .rmats files in your --tmp directory will be helpful. You could also try running with a new --tmp directory in case the issue came from an error in a previous run

Eric
Reply all
Reply to author
Forward
0 new messages