I've been following the tutorial on github
https://github.com/bli25ucb/RSEM_tutorial#single and was able to successfully follow all the steps.
However when I try and repeat with a human reference I get a few errors.
The first error occurs when running extract-reference-transcripts
I get 368 lines of "Warning: Encountered reference sequence with only gaps"
The two files (.gtf and .fa) are the human equivalents from the tutorial and version 37.74
software/RSEM-1.2.25/rsem-prepare-reference --gtf ref/Homo_sapiens.GRCh37.74.gtf --bowtie2 --bowtie2-path software/bowtie2-2.2.6 ref/Homo_sapiens.GRCh37.74.dna.toplevel.fa ref/human_ref
The code does run to completion though.
Then if I use the human_ref.transcripts.fa file from above in the parse-alignments script, I get the following error
RSEM_tutorial-master/software/bowtie2-2.2.6/bowtie2 -q --phred33 --sensitive --dpad 0 --gbar 99999999 --mp 1,1 --np 1 --score-min L,0,-0.1 -I 1 -X 1000 --no-mixed --no-discordant -p 8 -k 200 -x ref/human_ref -1 data/output_1.fastq -2 data/output_2.fastq | samtools view -S -b -o exp/test1.temp/test1.bam -
[samopen] SAM header is present: 214802 sequences.
74385280 reads; of these:
74385280 (100.00%) were paired; of these:
48283089 (64.91%) aligned concordantly 0 times
6361958 (8.55%) aligned concordantly exactly 1 time
19740233 (26.54%) aligned concordantly >1 times
35.09% overall alignment rate
rsem-parse-alignments ref/human_ref exp/test1.temp/test1 exp/test1.stat/test1 b exp/test1.temp/test1.bam -t 3 -tag XM
Warning: The SAM/BAM file declares less reference sequences (214802) than RSEM knows (215170)!
Read SN180:329:D1WMFACXX:1:1101:1342:89113 is both unalignable and alignable according to the input SAM/BAM file!
"rsem-parse-alignments ref/human_ref exp/test1.temp/test1 exp/test1.stat/test1 b exp/test1.temp/test1.bam -t 3 -tag XM" failed! Plase check if you provide correct parameters/options for the pipeline!
Strange as the difference between the two numbers is 368 (215170-214802), which makes me think there's an issue with either the .gtf or the .fa I downloaded from ensembl.