Possible reference issue?

361 views
Skip to first unread message

Regan Hayward

unread,
Oct 25, 2016, 1:58:14 AM10/25/16
to RSEM Users
I've been following the tutorial on github https://github.com/bli25ucb/RSEM_tutorial#single and was able to successfully follow all the steps.

However when I try and repeat with a human reference I get a few errors.

The first error occurs when running extract-reference-transcripts

I get 368 lines of "Warning: Encountered reference sequence with only gaps"

The two files (.gtf and .fa) are the human equivalents from the tutorial and version 37.74

software/RSEM-1.2.25/rsem-prepare-reference --gtf ref/Homo_sapiens.GRCh37.74.gtf --bowtie2 --bowtie2-path software/bowtie2-2.2.6 ref/Homo_sapiens.GRCh37.74.dna.toplevel.fa ref/human_ref


The code does run to completion though. 
Then if I use the human_ref.transcripts.fa file from above in the parse-alignments script, I get the following error

RSEM_tutorial-master/software/bowtie2-2.2.6/bowtie2 -q --phred33 --sensitive --dpad 0 --gbar 99999999 --mp 1,1 --np 1 --score-min L,0,-0.1 -I 1 -X 1000 --no-mixed --no-discordant -p 8 -k 200 -x ref/human_ref -1 data/output_1.fastq -2 data/output_2.fastq | samtools view -S -b -o exp/test1.temp/test1.bam -
[samopen] SAM header is present: 214802 sequences.
74385280 reads; of these:
  74385280 (100.00%) were paired; of these:
    48283089 (64.91%) aligned concordantly 0 times
    6361958 (8.55%) aligned concordantly exactly 1 time
    19740233 (26.54%) aligned concordantly >1 times
35.09% overall alignment rate

rsem-parse-alignments ref/human_ref exp/test1.temp/test1 exp/test1.stat/test1 b exp/test1.temp/test1.bam -t 3 -tag XM
Warning: The SAM/BAM file declares less reference sequences (214802) than RSEM knows (215170)!
Read SN180:329:D1WMFACXX:1:1101:1342:89113 is both unalignable and alignable according to the input SAM/BAM file!
"rsem-parse-alignments ref/human_ref exp/test1.temp/test1 exp/test1.stat/test1 b exp/test1.temp/test1.bam -t 3 -tag XM" failed! Plase check if you provide correct parameters/options for the pipeline! 


Strange as the difference between the two numbers is 368 (215170-214802), which makes me think there's an issue with either the .gtf or the .fa I downloaded from ensembl.






Bo Li

unread,
Nov 3, 2016, 3:27:16 AM11/3/16
to rsem-...@googlegroups.com
Hi Regan,

Sorry for my late reply.

For the warnings, I guess that some of the extracted transcripts contain
only 'N's. To remove the warning messages, you should remove these
transcripts from the GTF file.

For the second error, can you send me all alignments read
SN180:329:D1WMFACXX:1:1101:1342:89113 has? You can use the following
command:

samtools view exp/test1.temp/test1.bam | grep
"SN180:329:D1WMFACXX:1:1101:1342:89113"

Hope it helps,
Bo


On 2016-10-24 22:58, Regan Hayward wrote:
> I've been following the tutorial on
> github https://github.com/bli25ucb/RSEM_tutorial#single [1] and was
> --
> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [2]
> ---
> You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
> To post to this group, send email to rsem-...@googlegroups.com.
> Visit this group at https://groups.google.com/group/rsem-users [3].
>
>
> Links:
> ------
> [1] https://github.com/bli25ucb/RSEM_tutorial#single
> [2] http://deweylab.biostat.wisc.edu/rsem/
> [3] https://groups.google.com/group/rsem-users

Regan Hayward

unread,
Nov 8, 2016, 1:13:33 AM11/8/16
to RSEM Users
Thanks for your reply Bo

Interesting - I'll take a look at removing those transcripts with lots of N's from the downloaded GTF.

I've attached a txt file of the grep command you asked for

Thanks
Regan
to_upload.txt
Reply all
Reply to author
Forward
0 new messages