Retroposed Genes V5 Source Gene name!

30 views

Skip to first unread message

Roberto Munita

unread,

May 16, 2014, 3:21:00 PM5/16/14

to gen...@soe.ucsc.edu

Hi,

I was trying to use the data from Retroposed Genes V5 track. But my problem is that I can´t find a table that have pseudogenes alignments information with the gene name (I searched in the table browser).

For example NM_004643.3-18 is "retro-PABPN1" and if you search in the genome browser: "NM_004643.3-18"and you click over the pseudogene the information say:

Source Gene:

NM_004643.3-18 PABPN1 Homo sapiens poly(A) binding protein, nuclear 1 (PABPN1), mRNA.

Where I can found that information for all the Retroposed Genes V5 track?

I hope you could help me,

Thanks for your help!

Roberto

Enviado con MailTrack

Jonathan Casper

unread,

May 19, 2014, 8:30:56 PM5/19/14

to Roberto Munita, gen...@soe.ucsc.edu

Hello Roberto,

Thank you for your question about finding gene names and descriptions for items in the hg19 Retroposed Genes track. It is difficult to do this with the UCSC Table Browser because the information is not stored in a single table. Instead, the Genome Browser must access multiple tables and tie the results together. If you are able to connect to our public mysql server (more information on doing this is available at http://genome.ucsc.edu/goldenPath/help/mysql.html), you can use the following commands to get the information you want:

use hg19;
select ucscRetroAli5.qName, geneName.name, description.name from ucscRetroAli5, gbCdnaInfo, geneName, description where substring_index(ucscRetroAli5.qName, '.', 1) = gbCdnaInfo.acc and gbCdnaInfo.geneName = geneName.id and gbCdnaInfo.description = description.id;

The gene name and description information that you want are in the geneName and description tables, which have ids that correspond to the gbCdnaInfo table. The names of items in the gbCdnaInfo table do not match up exactly with the Retroposed Genes transcript names, however, which are stored in the "qName" field of the ucscRetroAli5 table. That's why the above command uses the "substring_index" mysql tool - you can get the gbCdnaInfo names by taking everything before the '.' in the Retroposed Genes transcript names.

You can still get your list if you cannot connect to our public mysql server, but it will take a bit more work.

1. You will need to get a list of the transcript names from the Retroposed Genes track. To do this, use the table "ucscRetroAli5" with the output option "selected fields from primary and related tables", and when the Table Browser asks which columns you want, choose "qName". This will give you a list of all the transcript names.
2. Now you need to convert the transcript names into names that match the gbCdnaInfo table names. You can do this by removing everything after the "." from the Retroposed Genes names. For example, "NM_004643.3-18" should become ""NM_004643".
3. Now go back to the table browser, choose the gbCdnaInfo table from the RefSeq Genes track, and paste or upload your list of identifiers (the name list that you created in step 2).
4. Again, choose the output option "selected fields from primary and related tables". This time, scroll down to the "Linked Tables" section and put check marks next to the "description" and "geneName" tables. Click "allow selection from checked tables". Now check the "name" box for both hg19.description fields and hg19.geneName fields. Finally, check the "acc" box for fields from hg19.gbCdnaInfo. Click the "get output" button.

The result should be a list of identifiers along with the associated gene names and descriptions.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group

--

Reply all

Reply to author

Forward

0 new messages