Begin forwarded message:From: Christa-Lynn Blenck <ble...@colorado.edu>Subject: Re: [genome] rn6 GTF questionDate: January 14, 2015 at 3:23:55 PM MSTTo: Jonathan Casper <jca...@soe.ucsc.edu>Hi Jonathan,Thanks again for all of your help with this, I really appreciate it. I just tried to get the rn6 GTF file again. I put the .hg conf file in my home directory, but now when I try the command:genePredToGtf rn6 refGene 150114_rn6_newGTFoutput.gtfI get the following error:genePredToGtf: command not foundSo, I am not sure if the genePredToGtf utility I download didn’t download correctly or I didn’t download the correct version. I am using a Mac to download the utility, but then transferring it to my account/ home directory on our super-computer on campus where all my data is stored. Is there an efficient way I can download this utility from my terminal? I tried using this rsync command: rsync -a -P rsync://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/genePredToGtf ./ but that did not work.Sorry for all of the questions, I am new to the world of bioinformatics (if you couldn’t already tell) and I am just struggling to get this working.Appreciate any feedback,ChristaOn Jan 14, 2015, at 12:31 PM, Jonathan Casper <jca...@soe.ucsc.edu> wrote:Hello Christa,
Any GTF output from the UCSC Table Browser will have the issue with matching gene and transcript IDs. If you want output with actual gene IDs, you will need to use the genePredToGtf utility. In case of the Ensembl Genes track, you will be provided with Ensembl transcript and gene IDs. More information about that track (in particular, links to relevant references) is available on the track page at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=rn5&g=ensGene. At a quick glance, I do not see anything resembling multiple isoforms in that track.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics GroupOn Wed, Jan 14, 2015 at 10:16 AM, Christa-Lynn Blenck <Christa...@colorado.edu> wrote:Thank you for all of your help with this. I am going to try your suggestions and hopefully I am able to get the rn6 GTF I need. In the past I had contacted your group about needing the mitochondrial annotations since the rn6 GTF did not include the mitochondrial chromosome. I had been told by one of your colleagues that:"RefSeq does not provide mitochondrial annotations. We recommend using the Ensembl track on the rn5 assembly instead. Rn6 and rn5 use the same mitochondrial sequence. Using the Table Browser, select the rn5 assembly, then select the "Ensembl Genes" track. Type in "chrM" next to position. Select "GTF" as your output format and click "get output”. So you had suggested I just add the rn5 mitochondrial annotations to my rn6 RefSeq annotation file, which is what I had been using in my analysis.My new question is does the rn5 mitochondrial GTF file I downloaded in the manner described above have the same issue where the gene and transcript ID are the same just like the rn6 RefSeq GTF I had been working with? Do I need to use the genePredToGTF program on the ensemble rn5 mitochondrial annotations as well before I then add it to my new rn6 GTF? I am just not sure how/if mitochondrial encoded gene isoforms are annotated and how I should handle this in my analysis.Hopefully this question was clear, I really appreciate your help with this.ChristaOn Jan 13, 2015, at 12:58 PM, Jonathan Casper <jca...@soe.ucsc.edu> wrote:Hello Christa,
You can place all four lines in your .hg.conf file without any problems. The assembly name that you need to supply is rn6 (you did this correctly with the final attempt in your screenshot). The table name that you need to supply is not a long filename like "rn6_refGene_annotation.gtf". Instead, you should just provide the name of the table used by UCSC to store the data. For the RefSeq Genes track, the table name is just "refGene". It is the same as the table name listed on the UCSC Table Browser when you select the RefSeq Genes track. An example command line for the tool would be:
genePredToGtf rn6 refGene sample_output.gtfThe error you are receiving suggests that the genePredToGtf tool is not able to find your .hg.conf configuration file. Is the .hg.conf file located in your home directory, did you put it somewhere else? .hg.conf needs to be in your home directory to be found by genePredToGtf.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead togenom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics GroupOn Mon, Jan 12, 2015 at 5:30 PM, Christa-Lynn Blenck <Christa...@colorado.edu> wrote:Hi Joe,Thanks for all of your help with this, however I am having some issues getting the GTF file that I’ll need for my analysis. Using the instructions from the link you sent, I kept getting the same error message when trying to create a GTF file from the rn6 reference -“can’t find database rn6.db in hg.conf, should have a default named “db" (I have attached a screenshot of my error- in this case I was doing a test run calling the desired gtf hello.txt). I am not sure what I am doing wrong, If I didn’t put the right lines in the hg.conf file (the website says to add three lines to the .hg.conf file, but there are four lines). In addition, I am not sure what the .db name for the rn6 rat genome is so I keep getting an error. I also am not sure if I am supposed to be using the gtf file I got from your table browser (in my case I have called rn6_RefGene_annotations.gtf) for the genePredtoGTF program to reference. If you could please point me in the right direction as to how to get this GTF file, that would be greatly appreciated.Thank you,Christa<GenePredtoGtf_error.png>On Jan 7, 2015, at 12:54 PM, Jonathan Casper <jca...@soe.ucsc.edu> wrote:Hello Christa,
Thank you for your question about downloading data in GTF format using the UCSC Table Browser. I could not find the attached image of your settings, but in this case they probably won't make a difference. The UCSC Table Browser is limited in this particular respect - it only generates GTF output where the transcript ID and the gene ID are the same (both use the transcript ID value). You can instead use the command-line tool genePredToGtf to download a GTF file that meets your needs; information on how to do this is available at http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format. Please note that the gene name associated with NM_001110139 and NM_001110823 in the rn6 RefSeq Genes track will be ATP2A2, not SERCA2, if that makes a difference.
Note also that the link provided in that help page goes directly to the linux x86_64 directory for downloading the genePredToGtf utility. The parent directory (http://hgdownload.soe.ucsc.edu/admin/exe) also has directories for a few different computer architectures (like macOSX x86_64), and source code for compiling our tools yourself is available in the userApps.src.tgz file if necessary.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics GroupOn Tue, Jan 6, 2015 at 4:46 PM, Christa-Lynn Blenck <Christa...@colorado.edu> wrote:Hello,
I am performing a sequencing experiment in rat tissue and I have been utilizing your newest rat genome assembly rn6. I downloaded the GTF file from your website for genes and gene predictions based upon the rn6 assembly. I recently noticed that in the GTF that I downloaded (I included an image how the settings I used), the gene ID and transcript ID appear to always be the same for each gene. I was wondering how can different isoforms be disinguished using this GTF file? For example, when viewing gene NM_001110139 (SERCA2 isoform A) or NM_001110823 (SERCA2 isoform B) in the viewer IGV, many reads appear to mapping to these genes, but in HTSeq count, there are no reads associated with either gene. This appears to be because the gene and transcript ID for isoform A is NM_001110139, while the gene and transcript ID for isoform B is NM_001110823, and since these isoforms are very similar HTSeq count cannot distinguish between the two and no reads get associated with either isoform since they are mostly overlapping. I hope this question is clear, but I am interested in using the newest rn6 assembly, and because ensemble does not currently have an annotation file available for the rn6 version, I would like to be able to use the GTF from the RefSeq Genes track with the UCSC table browser.
Thanks for your help,
Christa Blenck
Graduate Student
University of Colorado at Boulder
Leinwand Lab
--
--