Fail to open .bam files

1,242 views
Skip to first unread message

Jessica Elman

unread,
Apr 17, 2018, 9:39:37 PM4/17/18
to rMATS User Group

Hello,

I am running rMATS.4.0.1 on my university's cluster and it look like the program is running until I get the following message: 

There are 26485 distinct gene ID in the gtf file

There are 58259 distinct transcript ID in the gtf file

There are 15188 one-transcript genes in the gtf file

There are 535393 exons in the gtf file

There are 5729 one-exon transcripts in the gtf file

There are 3772 one-transcript genes with only one exon in the transcript

Average number of transcripts per gene is 2.199698

Average number of exons per transcript is 9.189876

Average number of exons per transcript excluding one-exon tx is 10.083076

Average number of gene per geneGroup is 261.649536

Fail to open /cluster/tufts/dmcb_kuperwasser02/Jess/STAR_output/4181_mix-NTC19_S1_Aligned.toTranscriptome.out.bam

Fail to open /cluster/tufts/dmcb_kuperwasser02/Jess/STAR_output/4185_mix-NTC19_S5_Aligned.toTranscriptome.out.bam

Fail to open /cluster/tufts/dmcb_kuperwasser02/Jess/STAR_output/4182_mix-FUBP1Polyclonal_S2_Aligned.toTranscriptome.out.bam

Fail to open /cluster/tufts/dmcb_kuperwasser02/Jess/STAR_output/4186_mix-FUBP1Polyclonal_S6_Aligned.toTranscriptome.out.bam


==========

Done processing each gene from dictionary to compile AS events

Found 11466 exon skipping events

Found 1106 exon MX events

Found 7152 alt SS events

There are 4065 alt 3 SS events and 3087 alt 5 SS events.

Found 1139 RI events

==========


Running the statistical part.

The statistical part is done.


The path is correct and I don't know what could be wrong, why the .bam files won't open. Any advice?

Thanks,

Jess 

Elizabeth

unread,
Jan 18, 2019, 3:53:30 AM1/18/19
to rMATS User Group
Hi,

I've been running into the same problem. Were you able to figure out why the error was occurring?

Best,
Elizabeth

Shihao Shen

unread,
Jan 20, 2019, 10:47:04 PM1/20/19
to rMATS User Group
Hi Elizabeth,

rMATS-turbo uses bamtools to read bam files. The error occurs when bamtools cannot process your bam files with its 'open' function. 

Could you please send me the first 2,000 lines of your bam files using commands such as the one below?

samtools view -h big.bam | head -n 2000 | samtools view -bS - > little.bam

Thanks,

Shihao

Aishwarya Griselda Jacob

unread,
Mar 12, 2019, 6:55:19 PM3/12/19
to rMATS User Group
Dear Shihao
I have a similar problem as Elizabeth.  I am only running the test data that I downloaded from the rMATS4.0.2 website to see if I am able to run rMATS at all. I am running the following code as instructed:
~/Documents/rMATS.4.0.2$ python rMATS-turbo-Linux-UCS4/rmats.py --b1 testData/b1.txt --b2 testData/b2.txt --gtf Homo_sapiens.GRCh38.83.gtf --od bam_test -t paired --readLength 50 --cstat 0.0001 --libType fr-unstranded
My WD is rMATS.4.0.2. I installed all the other requisite tools like samtools, bamtools and also the additional packages NUmpy, GSL etc from within this directory.
Please do let me know if there is a solution. I apologize if this is redundant. I was unable to find the solution in the forum.
AIshwarya

Shihao Shen

unread,
Apr 1, 2019, 12:56:09 PM4/1/19
to rMATS User Group
Hi Aishwarya,

Usually "failed to open bam file" errors are caused by incorrect input paths in the b1.txt and b2.txt.

If the test bam files are in the testData folder, please make sure the paths in the b1.txt and b2.txt also include the folder.

Also, when creating a b1.txt for bam inputs, please use comma (not \t or \n) to separate samples.

Hope this helps,

Shihao Shen

rMATS Developer
Scientist II, Children's Hospital of Philadelphia

Aishwarya Griselda Jacob

unread,
Jul 24, 2019, 8:23:46 AM7/24/19
to rMATS User Group
Dear Shihao
THank you for your reply and apologies for the delayed response. I had not had time to resume this then. However, I have now double checked the folder paths etc in b1.txt and in b2.txt.  I encounter the same problem (see below for output txt). I checked to see if samtools could view the .bam files which it did. So the files are not corrupted. I am using samtools v 1.9. Is this a problem somehow?
Thank you again for your input.
Aishwarya

:~/Documents/rMATS.4.0.2$ python rMATS-turbo-Linux-UCS4/rmats.py --b1 ~/Documents/rMATS.4.0.2/testData/b1.txt --b2 ~/Documents/rMATS.4.0.2/testData/b2.txt --gtf ~/Documents/rMATS.4.0.2/Homo_sapiens.GRCh38.83.gtf --od bam_test -t paired --readLength 50 --cstat 0.0001 --libType fr-unstranded
There are 60675 distinct gene ID in the gtf file
There are 199184 distinct transcript ID in the gtf file
There are 38917 one-transcript genes in the gtf file
There are 1176808 exons in the gtf file
There are 27376 one-exon transcripts in the gtf file
There are 25218 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 3.282802
Average number of exons per transcript is 5.908145
Average number of exons per transcript excluding one-exon tx is 6.690212
Average number of gene per geneGroup is 7.629187
Fail to open rMATS.4.0.2/testData/231ESRP.25K.rep-1.bam
Fail to open rMATS.4.0.2/testData/231ESRP.25K.rep-2.bam
Fail to open ~/Documents/rMATS.4.0.2/testData/231EV.25K.rep-1.bam
Fail to open ~/Documents/rMATS.4.0.2/testData231EV.25K.rep-2.bam


==========
Done processing each gene from dictionary to compile AS events
Found 38723 exon skipping events
Found 2461 exon MX events
Found 12847 alt SS events
There are 7881 alt 3 SS events and 4966 alt 5 SS events.
Found 5833 RI events

==========

Running the statistical part.

knert

unread,
Apr 14, 2020, 6:51:22 AM4/14/20
to rMATS User Group
Hi, Shihao

I got similar condition. I used WSL and Ubuntu18, and rMATS.4.0.2.

knert@DESKTOP-PIP2EHP:~/rMATS.4.0.2/rMATS-turbo-Linux-UCS4$ python rmats.py --b1 /home/knert/rMATS.4.0.2/b1.txt --b2 /home/knert/rMATS.4.0.2/b2.txt --gtf /home/knert/Dir/Drosophila_melanogaster.BDGP6.28.99.gtf --od /home/knert/rMATS.4.0.2/output -t paired --nthread 8 --readLength 75 --cstat 0.0001 --libType fr-firststrand
There are 17807 distinct gene ID in the gtf file
There are 34920 distinct transcript ID in the gtf file
There are 10414 one-transcript genes in the gtf file
There are 188169 exons in the gtf file
There are 5787 one-exon transcripts in the gtf file
There are 4545 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 1.961027
Average number of exons per transcript is 5.388574
Average number of exons per transcript excluding one-exon tx is 6.260323
Average number of gene per geneGroup is 3.934498
Fail to open S1.bam
Fail to open S2.bam
Fail to open S6.bam
Fail to open S7.bam

==========
Done processing each gene from dictionary to compile AS events
Found 2571 exon skipping events
Found 2171 exon MX events
Found 4884 alt SS events
There are 2392 alt 3 SS events and 2492 alt 5 SS events.
Found 1617 RI events
==========

Running the statistical part.
The statistical part is done.
Done.


and samtools
knert@DESKTOP-PIP2EHP:~/samtools-1.9$ ./samtools view /home/knert/rMATS.4.0.2/S1.bam | head -n 1000
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
NB501708:100:H5THNBGX7:1:23102:25584:17438      419     2L      6219    3       10M197593N65M   =       204017  197873 GTTAAATAATGAGAAGTTCTAGTTTTAGAGATTAGATACCTTAATACAGTTCTACAGCATGGCCACCCTGATACA      AA6AAEE6E6EEEEAEE/EAEEE/AEEEEEEEEEEAAE//EEEEEEAEEEE/<EEE/6EEAE/EA6/EEE/AEEE     NH:i:2  HI:i:2  AS:i:146        nM:i:0
NB501708:100:H5THNBGX7:4:11411:8189:14824       99      2L      6468    255     76M     =       6534    141     CTGCAACGAAAATTGTAAATTCCAATTAAAAGGATATTATTGTGCGATTTCACTTTAATTCTTATTTCAAAAAAGT    AAAAAEEEEAEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEAEEEEEE6EEEEE    NH:i:1  HI:i:1  AS:i:149        nM:i:0
NB501708:100:H5THNBGX7:4:11411:8189:14824       147     2L      6534    255     75M     =       6468    -141    TCAAAAAAGTTAATTATTAGTTGACGGAAATCAGAACGAATTTCACCGCAACGTCTTATGCAGCACAAAATGGCG     EEEEEEEEEEEEEE/EEAEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA     NH:i:1  HI:i:1  AS:i:149        nM:i:0
NB501708:100:H5THNBGX7:2:12103:18171:9461       99      2L      7461    255     76M     =       7508    123     AGCGCTTAGGAAAAATACATACTTGACGAGTAGAGTGAAATAATTACAAATATTAGACATATCCATTGCTACTCGC    AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE    NH:i:1  HI:i:1  AS:i:148        nM:i:1
NB501708:100:H5THNBGX7:2:12103:18171:9461       147     2L      7508    255     76M     =       7461    -123    AAATATTAGACATATCCATTGCTACTCGCATGTAGAGATTTCCACTTACGTTTTCTCTACTTTCAGCAACCGAGAA    EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA    NH:i:1  HI:i:1  AS:i:148        nM:i:1
NB501708:100:H5THNBGX7:3:22608:22192:1355       99      2L      7591    255     76M     =       7695    180     CACGTTTGAACAAGTATCGGCGTGTGGACAACAGCTATCCCCGCTTCATAACGAATGAGGCTGCCGAGGACCTGAT    AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEE    NH:i:1  HI:i:1  AS:i:150        nM:i:0
NB501708:100:H5THNBGX7:3:22608:22192:1355       147     2L      7695    255     76M     =       7591    -180    CAGCCACAGAGCTCAGAGCGGATCTCAATATTTAATCCGCCAGTATACACGCAGCACCAGGTGCGCAATGAAGCCC    EEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA    NH:i:1  HI:i:1  AS:i:150        nM:i:0
NB501708:100:H5THNBGX7:2:21310:4695:8523        99      2L      8137    255     75M     =       8265    204     CATCCATATCCGAATCAGTGGCAATAATGCAAAATGCTGATTTTATCACCAATTAGTGACGCACCACAGCCGTTA     AAAAAEEEEEEEEEEEEEEAEAEEEEEEEEEEEE/AEEEEAEEEEEE/EEEEEAEAEEEEEEEEEEEEEEEEEEE     NH:i:1  HI:i:1  AS:i:149        nM:i:0
NB501708:100:H5THNBGX7:2:21310:4695:8523        147     2L      8265    255     76M     =       8137    -204    CAGCTGCGTCAACTTAAATGATGACTTCGCCGAGCAATTTAAAGCAAGAGCGGCGGACTGTGAAGAGAAATCCAAA    EEEEEEEEAEEEEEEEAEE<EE//AAEEEEAEEEE<EEEEEEEEEEEA/EEA/EEEEEEAEEEEAEEEEEAAAAAA    NH:i:1  HI:i:1  AS:i:149        nM:i:0
NB501708:100:H5THNBGX7:4:11502:3587:2578        99      2L      8692    255     76M     =       8866    250     CACATAACTACCGAAGACATATGCACGTTTATTAATGGGAAATGGCTTAACGACGAGGTCATTAACTTTTACATGT    AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEE    NH:i:1  HI:i:1  AS:i:150        nM:i:0
NB501708:100:H5THNBGX7:4:12608:14742:14573      99      2L      8714    255     75M     =       8775    136     GCACGTTTATTAATGGGAAATGGCTTAACGACGAGGTCATTAACTTTTACATGTCCTTGCTGACAGAACGGTCGG     AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEE     NH:i:1  HI:i:1  AS:i:148        nM:i:0


I also checked the b1.txt, and b2.txt
knert@DESKTOP-PIP2EHP:~/rMATS.4.0.2$ cat b1.txt
S1.bam,S2.bam
knert@DESKTOP-PIP2EHP:~/rMATS.4.0.2$ cat b2.txt
S6.bam,S7.bam
knert@DESKTOP-PIP2EHP:~/rMATS.4.0.2$

Could you give me some suggestion?
Or just because I only have two replicates in each sample? (I thought in some other poster you had mentioned that but I could not find out if turbo could handle this. )

Thank you. 
Best, 
knert



Shihao Shen於 2019年1月21日星期一 UTC+8上午11時47分04秒寫道:

Thomas Danhorn

unread,
Apr 14, 2020, 6:49:40 PM4/14/20
to knert, rMATS User Group
I don't know why your BAM files are not found (double check that
everything exists and is spelled correctly and maybe add the full path,
e.g. /home/knert/rMATS.4.0.2/S1.bam, to your b1.txt file, so there is no
confusion), but the warning "[W::bam_hdr_read] EOF marker is absent. The
input is probably truncated" from samtools is concerning and indicates
that your BAM file is corrupted. I am not sure how rMATS would deal with
that, but it seems unrelated to the error message you are getting. In any
case I suggest you regenerate your BAM files and maybe check them with
PicardTools "ValidateSamFile"
(https://broadinstitute.github.io/picard/command-line-overview.html).

Best,

Thomas
> --
> You received this message because you are subscribed to the Google Groups "rMATS User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rmats-user-gro...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/rmats-user-group/df0a1e6a-df31-4fde-8821-b4d0b58d7743%40googlegroups.com.
>

knert

unread,
Apr 22, 2020, 5:29:20 AM4/22/20
to rMATS User Group


Hi, Thomas, 

I rechecked the STAR, and it seemed that fail to input .bam was not caused by truncated bam file. 
After I report the path and use the 4.0.3beta, the rMATS works. 

Thank you!!

Best, 
knert. 


Thomas Danhorn於 2020年4月15日星期三 UTC+8上午6時49分40秒寫道:
> To unsubscribe from this group and stop receiving emails from it, send an email to rmats-us...@googlegroups.com.

Aishwarya Griselda Jacob

unread,
Apr 27, 2020, 1:09:25 PM4/27/20
to knert, rMATS User Group
Hi
So 4.0.2 still spews the same problems for me as far as being unable to open .bam files is concerned. I have previously reported this problem. I have tried to test with the testData set from rMATS and even my own .bam files (which I know are fine). The path specified is correct.
Is there a 4.0.3 version? If so, where can I get it?
Thanks!
Aishwarya

You received this message because you are subscribed to a topic in the Google Groups "rMATS User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rmats-user-group/GfhqsiIF_84/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rmats-user-gro...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rmats-user-group/7e2dec35-f1fd-421b-b938-20baaa3c8612%40googlegroups.com.

Thomas Danhorn

unread,
Apr 27, 2020, 2:24:46 PM4/27/20
to Aishwarya Griselda Jacob, knert, rMATS User Group
I doubt your problem is related to the rMATS version, so changing that is
unlikely to help, but if you want, the beta versions are here:
https://sourceforge.net/projects/rnaseq-mats/files/MATS/beta/ Be warned
that this one will only work if your Python was compiled with the UCS-4
Unicode settings (the regular releases have versions for both UCS-2 and
-4).

I suspect your issue is with the paths. I recommend using the absolute
path for each of the BAMs (this is not required, but it eleminates one
possibility for mistakes). Here is a way to check your b1.txt without
using rMATS (analogous for b2.txt):

{ tr , \\n < b1.txt && echo; } |
while read b && [ -n "$b" ]; do
ls -l "$b"
done

Make sure you run this in the same directory as you would your rMATS
command, and if you get any error messages, your paths are wrong (I looked
at your previous mail and found potential issues with at least three of
the paths for which you got errors). If all your paths are correct, but
you still have issues, you can check the integrity of your files by
replacing the ls -l "$b" command with something like
samtools view -c "$b"
(this may run a while and will return the number of reads for each BAM;
since it needs to read through the whole BAM to do that, it should throw
an error if it is corrupted).

Hope this helps,

Thomas
>>>> knert@DESKTOP-PIP2EHP:‾/rMATS.4.0.2/rMATS-turbo-Linux-UCS4$ python
>>>> knert@DESKTOP-PIP2EHP:‾/samtools-1.9$ ./samtools view
>>>> knert@DESKTOP-PIP2EHP:‾/rMATS.4.0.2$ cat b1.txt
>>>> S1.bam,S2.bam
>>>> knert@DESKTOP-PIP2EHP:‾/rMATS.4.0.2$ cat b2.txt
>>>> S6.bam,S7.bam
>>>> knert@DESKTOP-PIP2EHP:‾/rMATS.4.0.2$
>> <https://groups.google.com/d/msgid/rmats-user-group/7e2dec35-f1fd-421b-b938-20baaa3c8612%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
> --
> You received this message because you are subscribed to the Google Groups "rMATS User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rmats-user-gro...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/rmats-user-group/CAAsgHMyPA0KxyAsPW3icRMOcmgjqCn275CzksDS8qtbhdbXQoQ%40mail.gmail.com.
>

Aishwarya Griselda Jacob

unread,
Aug 19, 2020, 1:56:19 PM8/19/20
to Thomas Danhorn, knert, rMATS User Group
Hello
Hope you are all well.
So I am running rMATS 4.0.3 beta on a set of .BAM files that have been sorted by coordinate. The .gtf I have used is v29 for hg38 as the alignment was done to hg38. However, I am getting only empty files in the output folder. The only files with data in them are those from the gtf file. Oddly enough the MXE.MATS.JCEC.txt file has some events recorded in it. All others are empty .txt files with the headers. Could there be a reason at all? I have attached a screenshot with my command and also the output display on the terminal. I am unable to locate the log file. Also, I used only the default parameters.
I have previously run 4.0.3 beta successfully on other sets of .bams with Homo_sapiens.GRCh38.83.gtf and I have had a proper output.


Screenshot from 2020-08-19 18-54-49.png
Screenshot from 2020-08-19 18-54-31.png
Screenshot from 2020-08-19 18-55-23.png

Thomas Danhorn

unread,
Aug 19, 2020, 6:25:25 PM8/19/20
to Aishwarya Griselda Jacob, rMATS User Group
If it were not for the results in the MXE.MATS.JCEC.txt file, I would say
there is an incompatibility between in the GTF and your BAMs (usually
different chromosome naming conventions). You can see the chromosome
names used in the BAM with

samtools view -H your.BAM | grep ^@SQ | cut -f 2 | cut -d : -f 2

and the ones in the GTF with

cut -f 1 Homo_sapiens.GRCh38.83.gtf | sort -u | grep -v '^#'
.

Are the results you get for MXE on different chromosomes, including
canonical (regular) ones? (There might be some chromosomes where the
names match, but others where they don't.)

Also, you want to be sure that your BAMs have been mapped against GRCh38
(the assembly used in the GTF), and not GRCh37 or even another organism.
(If the coordinates are off, you would get strange results.)

Good luck!

Thomas

Aishwarya Griselda Jacob

unread,
Aug 20, 2020, 5:48:05 AM8/20/20
to Thomas Danhorn, rMATS User Group
HI Thomas
THank you so much for your quick response!
So I have attached here the outputs of the chromosome names - based on the commands you suggested I use.
I have also attached the MXE output from the rMATS run. The chr names look to be the same.
I checked the alignment notes (it was another colleague who did that) - below
"hg38 genome (noted here by gencode.v29.annotation.gtf) and merge all exon overlaps into gene_id Counting the reads (the gene_ID field in this gtf file refers to full ensemblIDs)."

Thank you very much!
AIshwarya
MXE.MATS.JCEC.txt
BAM-chr.txt
HG38-V29_GTF-chr.txt
HG38-83_GTF-chr.txt

Thomas Danhorn

unread,
Aug 20, 2020, 7:04:59 PM8/20/20
to Aishwarya Griselda Jacob, rMATS User Group
Which GTF did you use for rMATS? The HG38-83 one has NCBI-style
chromosome names, whereas the HG38-V29 one and the BAM have UCSC-style
names (the V29 GTF is lacking the "random" chromosomes, but unless you are
interested in what is going on in those, this is not a problem per se).

If you used the HG38-V29 one, it should have worked, and if the other one,
you should not get any results at all. I still have no explanation why
your MXE.MATS.JCEC.txt has some results (only 4 have an FDR < 0.05, but
there are 81 detected, and MXE are much rarer than RI or SE), while
MXE.MATS.JC.txt and the SE and RI events don't. Especially
MXE.MATS.JC.txt, which only counts junction reads, should have at least
some similarity to MXE.MATS.JCEC.txt ...

According to your screen shots there were no errors, so either something
failed silently, or something is screwed up in general and by some
incredibly coincidence you get stuff in MXE.MATS.JCEC.txt, but nowhere
else ...

Can you look at the BAMs in IGV together with your GTF and make sure that
the reads are piled up where the exons (as shown by the GTF) are, and not
in other places (as might happen if the genome assembly is different)?

If that does not give you any clues, I would say try to run rMATS again,
to confirm that this is reproducible and not some fluke.

Good luck!

Thomas

Aishwarya Griselda Jacob

unread,
Aug 21, 2020, 2:22:06 AM8/21/20
to Thomas Danhorn, rMATS User Group
Dear Thomas 
Thank you for your reply! The output I sent you were from using v29 gencode. I will drag the Gtf into Igv with the bams and see what that looks like!
Mysterious indeed. I will also the newer version of rmats and see if that helps.
Thanks!
Will keep you posted!
Aishwarya 

Ahsan Polash

unread,
Mar 24, 2023, 4:51:27 PM3/24/23
to rMATS User Group
As of 2023 March24: I encountered same problem
running v4.1.2 from docker image (via singularity)
FAILED TO READ BAM

When I am the owner of the files, it works
But if I am not owner of the files (although I have chmod 777 access) , even after using full path it failed.
Seems weird. 

What I did copied all of them to a separate location and run the program.



Reply all
Reply to author
Forward
0 new messages