rsem-prepare-reference

3,047 views
Skip to first unread message

papori

unread,
Feb 17, 2012, 4:30:46 AM2/17/12
to RSEM Users
Hi,
i am tryin to use rsem, but i stuck in the first step...
i used the command:
./rsem-prepare-reference --gtf zvgenome.gtf --transcript-to-gene-map
knownIsoforms.txt zvgenome.fa zvgenome

(all the files are in this library)

i got this output:

./rsem-extract-reference-transcripts zvgenome 0 zvgenome.gtf 1
knownIsoforms.txt zvgenome.fa
Mapping Info is not correct, cannot find NM_131426's gene_id!
"./rsem-extract-reference-transcripts zvgenome 0 zvgenome.gtf 1
knownIsoforms.txt zvgenome.fa" failed! Plase check if you provide
correct parameters/options for the pipeline!

my gtf file looks like:
chr1 danRer7_refGene exon 50321634 50322231
0.000000 + . gene_id "NM_131426"; transcript_id
"NM_131426";

my knownIsoforms.txt looks like:
gene_id "NM_131426"; transcript_id "NM_131426";

i dont know what wrong.
when i tried to run rsem without gtf it worked fine.

when i tried without knownisoforms (this information is in GTF):
my command was:
./rsem-prepare-reference --gtf zvgenome.gtf zvgenome.fa zvgenome

the output:
./rsem-extract-reference-transcripts zvgenome 0 zvgenome.gtf 0
zvgenome.fa
Parsed 200000 lines
According to the GTF file given, a transcript has exons from different
orientations!
"./rsem-extract-reference-transcripts zvgenome 0 zvgenome.gtf 0
zvgenome.fa" failed! Plase check if you provide correct parameters/
options for the pipeline!

HELP please!

Best
Pap

Colin Dewey

unread,
Feb 17, 2012, 3:57:01 PM2/17/12
to rsem-...@googlegroups.com
Hi Pap,

Regarding the first error, your knownIsoforms.txt file has the wrong format. Each line of that file should look like:

GENE_ID TRANSCRIPT_ID

for example,

slka NM_001145601

The second error indicates that your GTF has the same transcript mapped to two different strands. This can sometimes happen in the UCSC RefSeq annotation, as they allow for the same RefSeq mRNA to map to multiple locations in the genome. If you want to continue to use that annotation, you'll have to made some modifications to the GTF so that it does not have such issues.

If you'd like to try a different annotation, I might recommend the Ensembl annotation, which you can download here:

http://useast.ensembl.org/info/data/ftp/index.html

If you use that annotation, be sure to also use the genome sequence from Ensembl as well, as the chromosomes are named differently from UCSC.

Colin

Dror Hibsh

unread,
Feb 19, 2012, 5:53:12 AM2/19/12
to rsem-...@googlegroups.com
Thanks Colin!
Now it work!
i just downloded the genome  + GTF from Ensembl.

Best,
Pap
--
-----------------
Dror Hibsh
0507-669599
------------------

lman...@univ-montp2.fr

unread,
Feb 7, 2013, 12:56:24 PM2/7/13
to rsem-...@googlegroups.com
--hello,


i don't find knownIsoforms.txt file from ensembl ftp ?
Where is it located ?

thank u

Feargal Ryan

unread,
Oct 14, 2013, 1:37:45 PM10/14/13
to rsem-...@googlegroups.com
Hi Colin, 

Sorry to pick up this conversation after it's been dead for so long, but mention here to make modifications to the GTF file. But RSEM doesn't report which transcript or exons that it's having problems with. Is there some way to get it to report this?

Thanks,
Feargal

Jong Cheol Jeong

unread,
Sep 9, 2014, 3:57:35 PM9/9/14
to rsem-...@googlegroups.com
Hi Geargal, 

Have you found any solutions for exon appearance in multiple chromosomes? 
I have faced same problem. 

Thanks, 
Jong Cheol 

Bo Li

unread,
Sep 9, 2014, 4:05:47 PM9/9/14
to rsem-...@googlegroups.com
Hi Jong,

Can you explain the problem you faced in more details?

Thanks,
Bo
>>> http://useast.ensembl.org/info/data/ftp/index.html [1]
> --
> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [2]
> ---
> You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
> To post to this group, send email to rsem-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/rsem-users [3].
>
>
> Links:
> ------
> [1]
> http://www.google.com/url?q75http%3A%2F%2Fuseast.ensembl.org%2Finfo%2Fdata%2Fftp%2Findex.html46sa75D46sntz75146usg75AFQjCNH6pxnxTiBf6ZW0BLTtJhlRl9nmAA
> [2] http://deweylab.biostat.wisc.edu/rsem/
> [3] http://groups.google.com/group/rsem-users

Jong Cheol Jeong

unread,
Sep 9, 2014, 4:23:22 PM9/9/14
to rsem-...@googlegroups.com
Hi Bo, 

I hae run RSEM using UCSC genome downloaded from http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
I also created GTF file from Table Browser with following options:

clade:     genome:     assembly: 
group:     track:      
table:   
region:  genome  ENCODE Pilot regions  position   
identifiers (names/accessions):  
filter: 
intersection: 
correlation: 
output format:  Send output to  Galaxy    GREAT   GenomeSpace
output file:   (leave blank to keep output in browser)
file type returned:  plain text   gzip compressed

And then I run following:
 $rsem-prepare-reference --gtf ../../hg19_refseq_intersection.gtf --bowtie ../../ucsc_hg19.fa human_19

Finally I have got errors shown below. 
rsem-extract-reference-transcripts human_19 0 ../../RefSeq_hg19.gtf 0 ../../ucsc_hg19.fa
Parsed 200000 lines
Parsed 400000 lines
Parsed 600000 lines
Parsed 800000 lines
According to the GTF file given, a transcript has exons on multiple chromosomes!
"rsem-extract-reference-transcripts human_19 0 ../../RefSeq_hg19.gtf 0 ../../ucsc_hg19.fa" failed! Plase check if you provide correct parameters/options for the pipeline!

I found Colin's answer above. 
However, I need to use UCSC rather than EMBL. 

I am new to use RSEM and UCSC Genome Browser, so if I did something wrong, then please correct me.

Thanks. 
Jong Cheol 

Joe Miyamoto Philips

unread,
Apr 10, 2015, 4:54:44 AM4/10/15
to rsem-...@googlegroups.com
Hy Jong-san.

I Have faced a same problem with you and decided to discard genes those have transcripts which came from different locus.
So I wrote a short perl script for that purpose. 



2014年9月10日水曜日 5時23分22秒 UTC+9 Jong Cheol Jeong:

Sanyi

unread,
Aug 25, 2015, 12:24:06 PM8/25/15
to RSEM Users
Hi:

Has anyone found a solution to this issue of having transcripts with exons on multiple chromosomes while running rsem? I have excatly the same issue as Jong Cheol' s below and was not able to solve it.

Thank you!

S.

Morteza Roodgar

unread,
Feb 21, 2018, 1:39:50 PM2/21/18
to RSEM Users
Hi there,

I am trying to run rem-prepare reference but I got the error below. 
The annotation of the organism I am using is not as great as the annotation of human or mouse genome. I appreciate any help to solve this problem and make the rsem-prepapre directory work. The same script worked well for human genome and human GTF files. Below is the error:  

According to the GTF file given, transcript NM_001280402 has exons on multiple chromosomes!


Thanks, 


Morteza 

Reply all
Reply to author
Forward
0 new messages