question regarding command format

389 views
Skip to first unread message

Krithika Bhuvaneshwar

unread,
Oct 10, 2013, 3:42:27 PM10/10/13
to rsem-...@googlegroups.com
Hello,
I am trying RSEM for the first time and am trying to prepare reference files before I run calculate expression
1) Is there a reference file already prepared for hg19 ? (I saw hg18 on your website ? )
This is the command I tried:
rsem-prepare-reference --gtf /ebs_sdf/data/Homo_sapiens/UCSC/hg19/Annotation/Archives/archive-2013-03-06-11-23-03/Genes/genes.gtf --transcript-to-gene-map /ebs_sdf/data/knownIsoforms.txt /ebs_sdf/data/Homo_sapiens/UCSC/hg19/Sequence/Chromosomes /ebs_sdf/data/ref

I downloaded the UCSC human reference genome and the knownIsoforms.txt; "/ebs_sdf/data/Homo_sapiens/UCSC/hg19/Sequence/Chromosomes" has the .fa files for each chromosome.
I added the path to rsem-1.2.7 to my path variable in /etc/profile. 

Error is "Mapping Info is not correct, cannot find NR_046018's gene_id! ". Not entirely sure what I am doing wrong here.

2) I also tried running calculate expression with the hg18 reference files. I know it did not run, but not able to scroll up and see what error is on my putty window because of the long message which has info about output formats and examples.
rsem-calculate-expression -p 16 \
                          --paired-end \
                          /ebs_sdf/data/trimmed_output_file1.fastq \
                          /ebs_sdf/data/trimmed_output_file2.fastq \
                          /ebs_sdf/data/human_refseq_NMonly_125bpPolyATail_extractedFromHumanGenome_hg18 \
                           --quiet
 
Any inputs as to what is wrong will be helpful. Thanks !

bli

unread,
Oct 11, 2013, 2:33:46 PM10/11/13
to rsem-...@googlegroups.com
Hi Krithika,

Please see my comments below:

On 2013-10-10 12:42, Krithika Bhuvaneshwar wrote:
> Hello,
> I am trying RSEM for the first time and am trying to prepare reference
> files before I run calculate expression
> 1) Is there a reference file already prepared for hg19 ? (I saw hg18
> on your website ? )

No, you need to build your own. But it is straightforward.

> This is the command I tried:
>
> rsem-prepare-reference --gtf
> /ebs_sdf/data/Homo_sapiens/UCSC/hg19/Annotation/Archives/archive-2013-03-06-11-23-03/Genes/genes.gtf
> --transcript-to-gene-map /ebs_sdf/data/knownIsoforms.txt
> /ebs_sdf/data/Homo_sapiens/UCSC/hg19/Sequence/Chromosomes
> /ebs_sdf/data/ref
>
> I downloaded the UCSC human reference genome and the
> knownIsoforms.txt;
> "/ebs_sdf/data/Homo_sapiens/UCSC/hg19/Sequence/Chromosomes" has the
> .fa files for each chromosome.
>
> I added the path to rsem-1.2.7 to my path variable in /etc/profile.
>
> Error is "Mapping Info is not correct, cannot find NR_046018's
> gene_id! ". Not entirely sure what I am doing wrong here.

Is your genes.gtf file gives RefSeq annotation or UCSC known genes? The
knownIsoforms.txt only works if your gtf is UCSC known genes.

>
> 2) I also tried running calculate expression with the hg18 reference
> files. I know it did not run, but not able to scroll up and see what
> error is on my putty window because of the long message which has info
> about output formats and examples.
>
> rsem-calculate-expression -p 16
>
> --paired-end
>
> /ebs_sdf/data/trimmed_output_file1.fastq
>
> /ebs_sdf/data/trimmed_output_file2.fastq
>
>
> /ebs_sdf/data/human_refseq_NMonly_125bpPolyATail_extractedFromHumanGenome_hg18
>
>
> --QUIET
>
> Any inputs as to what is wrong will be helpful. Thanks !

Did you extract the reference to folder
"/ebs_sdf/data/human_refseq_NMonly_125bpPolyATail_extractedFromHumanGenome_hg18"? The reference name should be the prefix that all files in this folder share. Therefore you should use "/ebs_sdf/data/human_refseq_NMonly_125bpPolyATail_extractedFromHumanGenome_hg18/NM_refseq_ref".

Best,
Bo


>
> --
> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [1]
> ---
> You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
> To post to this group, send email to rsem-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/rsem-users [2].
>
>
> Links:
> ------
> [1] http://deweylab.biostat.wisc.edu/rsem/
> [2] http://groups.google.com/group/rsem-users

Krithika Bhuvaneshwar

unread,
Oct 16, 2013, 5:59:05 PM10/16/13
to rsem-...@googlegroups.com
Thanks for the feedback.
I think i have figured out one of the problems for #2 - I only have bowtie2 right now, and I believe RSEM works only with bowtie1 ? Will try after downloading bowtie1

bli

unread,
Oct 16, 2013, 11:33:16 PM10/16/13
to rsem-...@googlegroups.com
Hi Krithika,

Yes, RSEM only works with bowtie1 currently.

Best,
Bo
>> --QUIET
>>
>> Any inputs as to what is wrong will be helpful. Thanks !
>

Krithika Bhuvaneshwar [kb472@georgetown.edu]

unread,
Oct 17, 2013, 4:56:21 PM10/17/13
to rsem-...@googlegroups.com
Thanks for the comments. I was able to get rsem-calculate-expression working with hg18.
Now I am trying to prepare the reference files for hg19. 

I downloaded hg19 reference files from ftp://igenome:G3no...@ussd-ftp.illumina.com/Homo_sapiens/UCSC/hg19/Homo_sapiens_UCSC_hg19.tar.gz, which has a genes.gtf file in Homo_sapiens/UCSC/hg19/Annotation/Archives/archive-2013-03-06-11-23-03/Genes which look like this: 
chr1    unknown exon    11874   12227   .       +       .       gene_id "DDX11L1"; gene_name "DDX11L1"; transcript_id "NR_046018"; tss_id "TSS14844";

I also got the chromosome fasta files in /Homo_sapiens/UCSC/hg19/Sequence/Chromosomes
I got the knownisoforms.txt from http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/ (the file looks like this)
1 uc010nxq.1
1 uc010nxr.1
1 uc001aaa.3
2 uc009vis.3
2 uc001aae.4

I wanted to confirm that I had the correct input files. Could you please let me know if these are the right files ? If not, could you please share the web link as to where I can get them ?

Thanks much
Krithika



 To post to this group, send email to rsem-...@googlegroups.com.
--- You received this message because you are subscribed to a topic in the Google Groups "RSEM Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rsem-users/oto_OJg5NcQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rsem-users+unsubscribe@googlegroups.com.

To post to this group, send email to rsem-...@googlegroups.com.



--
--Krithika
--------------------------------------------------------------------------
Ms. Krithika Bhuvaneshwar
Bioinformatician/Data Manager
Innovation Center for Biomedical Informatics (ICBI)
Georgetown University Medical Center
2115 Wisconsin Ave NW, Suite 110, Washington, DC 20007
---------------------------------------------------------------------------
Email: kb...@georgetown.edu; krith...@gmail.com;
Phone: 202-687-6850; Fax: 202-687-5011
http://icbi.georgetown.edu/
---------------------------------------------------------------------------

Colin Dewey

unread,
Oct 17, 2013, 5:00:21 PM10/17/13
to rsem-...@googlegroups.com
Hi Krithika,

Those files look about right.  However, you will not need the knownIsoforms file because a transcript_id to gene_id mapping is already contained within your GTF file.

Best,
Colin

To unsubscribe from this group and stop receiving emails from it, send an email to rsem-users+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages