[Genome] help: Exonic position map to Protein position

7 views
Skip to first unread message

janeela khan

unread,
Mar 31, 2011, 11:21:24 AM3/31/11
to UCSC genome


Dear All,
Could you guide me how I can map certain Positions in an exon to the Protein positions? Here i donot have the exact genomic positions but I have the gene name and the relative position in an exon. Is there a way to map this position to protein?
Thanks for the help in advance
MvH/janeela

Vanessa Kirkup Swing

unread,
Apr 1, 2011, 12:38:36 PM4/1/11
to janeela khan, UCSC genome
Hi Janeela,

To figure out where in the exon the protein is translated from, you will need to use the table browser. To get to the table browser click on on "Tables" from the blue navigation bar.

Set the clade, genome, and assembly.

Then you will need to set the following:

group: Gene and Gene prediction tracks
track: UCSC Genes
table: knownGene
region: genome
identifiers (names/accessions): click on "paste list" and paste in the identifiers following the instructions.
output format: selected fields from primary and related tables

click "get output"

select the fields you want displayed.

click "get output"

Hope this helps lead you in the right direction. If you have further questions, please contact us at gen...@soe.ucsc.edu

Vanessa Kirkup Swing
UCSC Genome Bioinformatics Group
_______________________________________________
Genome maillist - Gen...@soe.ucsc.edu
https://lists.soe.ucsc.edu/mailman/listinfo/genome

janeela khan

unread,
Apr 4, 2011, 6:41:59 PM4/4/11
to UCSC genome


Thank you so very much. It was very useful information for me. I wonder if I can also retrieve the exonic sequence for the pig genome?> From: genome-...@soe.ucsc.edu
> Subject: Genome Digest, Vol 99, Issue 3
> To: gen...@soe.ucsc.edu
> Date: Fri, 1 Apr 2011 12:00:12 -0700
>
> Send Genome mailing list submissions to
> gen...@soe.ucsc.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
> or, via email, send a message with subject or body 'help' to
> genome-...@soe.ucsc.edu
>
> You can reach the person managing the list at
> genome...@soe.ucsc.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Genome digest..."
>
>
> Today's Topics:
>
> 1. Re: help: Exonic position map to Protein position
> (Vanessa Kirkup Swing)
> 2. Re: How to generate mapping between Ensembl and refseq
> transcript IDs (Hiram Clawson)
> 3. Re: bedgraph data will not display points (Hiram Clawson)
> 4. Re: when is a query excessive. (Galt Barber)
> 5. Re: bedgraph data will not display points
> (Lionel (Lee) Brooks 3rd)
> 6. Re: bedgraph data will not display points (Hiram Clawson)
> 7. protein families (Tom Traut)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 1 Apr 2011 09:38:36 -0700 (PDT)
> From: Vanessa Kirkup Swing <van...@soe.ucsc.edu>
> Subject: Re: [Genome] help: Exonic position map to Protein position
> To: janeela khan <janeel...@hotmail.com>
> Cc: UCSC genome <gen...@soe.ucsc.edu>
> Message-ID:
> <779613075.45344.13016...@mail-01.cse.ucsc.edu>
> Content-Type: text/plain; charset=utf-8
> ------------------------------
>
> Message: 2
> Date: Fri, 01 Apr 2011 09:41:00 -0700
> From: Hiram Clawson <hi...@soe.ucsc.edu>
> Subject: Re: [Genome] How to generate mapping between Ensembl and
> refseq transcript IDs
> To: "Cook, Malcolm" <M...@stowers.org>
> Cc: "'Rajasimha, Harsha \(NIH/NEI\) \[C\]'" <rajas...@nei.nih.gov>,
> "'gen...@soe.ucsc.edu'" <gen...@soe.ucsc.edu>
> Message-ID: <4D96001C...@soe.ucsc.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Sorry Malcolm, there isn't a generic method for all genomes at UCSC.
> This is a most interesting example you have here. Usually chrM at
> Ensembl is: "Mt"
>
> Newer genome assemblies at UCSC are including two tables:
> ensemblLift
> ucscToEnsembl
>
> Which allow translation of UCSC names to Ensembl names and
> coordinate conversions for haplotypes and other random bits that
> might be located in a different coordinate system. For example:
>
> $ hgsql -e "select * from ensemblLift;" hg19
> +-----------------+----------+
> | chrom | offset |
> +-----------------+----------+
> | HSCHR4_1 | 69170076 |
> | HSCHR17_1 | 43384863 |
> | HSCHR6_MHC_APD | 28696603 |
> | HSCHR6_MHC_COX | 28477796 |
> | HSCHR6_MHC_DBB | 28696603 |
> | HSCHR6_MHC_MANN | 28696603 |
> | HSCHR6_MHC_MCF | 28696603 |
> | HSCHR6_MHC_QBL | 28696603 |
> | HSCHR6_MHC_SSTO | 28659142 |
> +-----------------+----------+
>
> $ hgsql -e "select * from ucscToEnsembl;" hg19 | grep MHC
> chr6_ssto_hap7 HSCHR6_MHC_SSTO
> chr6_qbl_hap6 HSCHR6_MHC_QBL
> chr6_mcf_hap5 HSCHR6_MHC_MCF
> chr6_mann_hap4 HSCHR6_MHC_MANN
> chr6_cox_hap2 HSCHR6_MHC_COX
> chr6_dbb_hap3 HSCHR6_MHC_DBB
> chr6_apd_hap1 HSCHR6_MHC_APD
>
> It would be a useful process to go back over some of the older popular
> genomes to add these conversion tables.
>
> --Hiram
>
> Cook, Malcolm wrote:
> > Hiram,
> >
> > Is there a similar approach for chromosomal identifiers? (i.e. chrM in dm3 is dmel_mitochondrion_genome at ensemble)
> >
> > Or better, an SQL query for same?
> >
> > Thx
> >
> > Malcolm Cook
> > Stowers Institute for Medical Research - Bioinformatics
> > Kansas City, Missouri USA
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 01 Apr 2011 09:49:16 -0700
> From: Hiram Clawson <hi...@soe.ucsc.edu>
> Subject: Re: [Genome] bedgraph data will not display points
> To: Lionel Brooks <Lionel...@dartmouth.edu>
> Cc: gen...@soe.ucsc.edu
> Message-ID: <4D96020C...@soe.ucsc.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Good Morning Lionel:
>
> The bedGraph drawing mechanism can construct bar graphs at your
> specified intervals, or when you select graphType=points it will
> draw only the top of the bar graph at your specified intervals.
> There is no line drawing except by the trick of "smoothing" points
> such that they appear to be in a line graph. This only works if
> the data points are continuous when seen in the genome browser.
> Smoothing will not smear points into areas where there is no
> data value specified.
>
> The Genome Graphs function of the genome browser:
> http://genome.ucsc.edu/cgi-bin/hgGenome
> will only draw lines between your specified points.
>
> See also:
>
> http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
>
> --Hiram
>
> Lionel Brooks wrote:
> > Hello all,
> >
> > I have a bedgraph file. In the past I have used to files to attain
> > graphic output in the form of a smoothed line but I uploaded my most
> > recent data set and now I cannot get a line graph. In fact, I'm not
> > sure what I am looking at because the values that are displayed along
> > the y-axis are not described with a label.
> >
> > Here is my track line:
> > track type=bedGraph autoScale=on graphType=points windowingFunction=mean
> > smoothingWindow=16
> >
> > My data format is
> >
> > chr coordA coordB value
> >
> > Where approximate distribution of data values are : 5 <= value <= 500.
> > Is it possible that your plotting function cannot compute this line
> > because my coordinate intervals are too small?
> > Another possibly relevant issue may be that the coordinate intervals are
> > not fixed length.
> >
> > Any suggestions for course of action would be great.
> >
> > Sincerely,
> > Lionel
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 01 Apr 2011 10:25:14 -0700
> From: Galt Barber <ga...@soe.ucsc.edu>
> Subject: Re: [Genome] when is a query excessive.
> To: John Hayward <john.h...@wheaton.edu>
> Cc: "gen...@soe.ucsc.edu" <gen...@soe.ucsc.edu>
> Message-ID: <4D960A7A...@soe.ucsc.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>
> Hi, John!
>
> Queries that take more than a few minutes to run are
> probably inappropriate for the shared public mysql server.
>
> I found this query formulation for you that takes less than one minute:
>
> select name, observed, count(*) from(
> (select name, observed, 'CEU' from hapmapSnpsCEU where chrom = 'Chr16')
> union
> (select name, observed, 'YRI' from hapmapSnpsYRI where chrom = 'Chr16')
> union
> (select name, observed, 'CHB' from hapmapSnpsCHB where chrom = 'Chr16')
> union
> (select name, observed, 'JPT' from hapmapSnpsJPT where chrom = 'Chr16')
> ) resultAlias group by name, observed having count(*) = 4 limit 30;
>
> +-----------+----------+----------+
> | name | observed | count(*) |
> +-----------+----------+----------+
> | rs1000014 | A/G | 4 |
> | rs1000047 | C/T | 4 |
> | rs1000077 | C/G | 4 |
> | rs1000078 | A/G | 4 |
> | rs1000100 | A/T | 4 |
> | rs1000174 | A/G | 4 |
> | rs1000178 | C/T | 4 |
> | rs1000192 | A/G | 4 |
> | rs1000193 | A/C | 4 |
> | rs1000454 | C/G | 4 |
> | rs1000455 | A/T | 4 |
> | rs1000710 | G/T | 4 |
> | rs1000711 | C/G | 4 |
> | rs1000720 | A/G | 4 |
> | rs1000742 | C/T | 4 |
> | rs1001157 | A/G | 4 |
> | rs1001170 | G/T | 4 |
> | rs1001171 | A/T | 4 |
> | rs1001302 | A/G | 4 |
> | rs1001362 | C/T | 4 |
> | rs1001366 | C/T | 4 |
> | rs1001493 | C/T | 4 |
> | rs1001554 | A/G | 4 |
> | rs1001608 | C/T | 4 |
> | rs1001631 | C/G | 4 |
> | rs1001655 | A/G | 4 |
> | rs1001722 | G/T | 4 |
> | rs1001776 | C/T | 4 |
> | rs1001861 | A/G | 4 |
> | rs1001871 | C/G | 4 |
> +-----------+----------+----------+
> 30 rows in set (46.17 sec)
>
> Of course for your own full output,
> you would remove the "limit" clause.
>
> In case you are curious how many there are:
>
> select count(*) from (
> select name, observed, count(*) from(
> (select name, observed, 'CEU' from hapmapSnpsCEU where chrom = 'Chr16')
> union
> (select name, observed, 'YRI' from hapmapSnpsYRI where chrom = 'Chr16')
> union
> (select name, observed, 'CHB' from hapmapSnpsCHB where chrom = 'Chr16')
> union
> (select name, observed, 'JPT' from hapmapSnpsJPT where chrom = 'Chr16')
> ) resultAlias group by name, observed having count(*) = 4) resultAlias2;
> +----------+
> | 105841 |
> +----------+
> 1 row in set (47.60 sec)
>
>
> Another alternative would be to capture the output from each like this:
>
> select name, observed, 'CEU' from hapmapSnpsCEU where chrom = 'Chr16'
>
> for each of your 4 files.
> You could sort them by name (rsId) either with an order by clause in
> sql, or with the unix sort command.
>
> You can even use the unix join command to join them up on the name and
> observed fields.
>
> Once the contents of each of the 4 sets are sorted by name and observed,
> joining them can be very fast.
>
> -Galt
>
> 4/1/2011 8:25 AM, John Hayward:
> > I would like to run queries against the genome-mysql.cse.ucsc.edu database which may be excessive and don't want to cause problems for others.
> >
> > I want to find matches for a particular chromosome which have the same name and observation for tables hapmapSnpsCEU, haphapmapSnpsYRI, mapSnpsCHB, hapmapSnpsJPT.
> >
> > Doing a query to pickup the count of hapmapSnpsCEU for one chromosome took 0.14 seconds.
> > If I do a query to pick up the count joining hapmapSnpsCEU and hapmapSnpCHB took 8.40 seconds.
> >
> > If I join all tables would that constitute an excessive load?
> >
> > Below is the query joining two tables.
> > ======
> > select count(*) from hapmapSnpsCEU, hapmapSnpsCHB where hapmapSnpsCEU.chrom = 'Chr16' and hapmapSnpsCHB.chrom = 'Chr16' and hapmapSnpsCEU.name = hapmapSnpsCHB.name and hapmapSnpsCEU.observed = hapmapSnpsCHB.observed;
> > ======
> > johnh...
> >
> >
> > _______________________________________________
> > Genome maillist - Gen...@soe.ucsc.edu
> > https://lists.soe.ucsc.edu/mailman/listinfo/genome
>
>
>
> ------------------------------
>
> Message: 5
> Date: Fri, 01 Apr 2011 14:16:42 -0400
> From: "Lionel (Lee) Brooks 3rd" <Lionel...@dartmouth.edu>
> Subject: Re: [Genome] bedgraph data will not display points
> To: Hiram Clawson <hi...@soe.ucsc.edu>
> Cc: gen...@soe.ucsc.edu
> Message-ID: <4D96168A...@dartmouth.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi Hiram,
>
> >From
> http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
>
> 1. Pseudo /line graphs/ can be drawn with the wiggle tracks by
> setting optional drawing parameters in the display of the track to
> draw /points/ instead of bars with smoothing on to smear the
> points together into a line.
>
> The pseudo line graph functionality is what I desire.
> Previously, it had been possible to do this with bedgraph format files.
> I don't know what "smearing" means. I'm just looking for a quick way to
> draw the moving average as I had been able to do before.
> As I mentioned below, my track line is:
> track type=bedGraph autoScale=on graphType=points windowingFunction=mean
> smoothingWindow=16
>
> I suppose my solution is to modify my scripts to use the wiggle variable
> step format?
>
>
> thanks,
> -Lionel
>
>
> Hiram Clawson wrote:
> > Good Morning Lionel:
> >
> > The bedGraph drawing mechanism can construct bar graphs at your
> > specified intervals, or when you select graphType=points it will
> > draw only the top of the bar graph at your specified intervals.
> > There is no line drawing except by the trick of "smoothing" points
> > such that they appear to be in a line graph. This only works if
> > the data points are continuous when seen in the genome browser.
> > Smoothing will not smear points into areas where there is no
> > data value specified.
> >
> > The Genome Graphs function of the genome browser:
> > http://genome.ucsc.edu/cgi-bin/hgGenome
> > will only draw lines between your specified points.
> >
> > See also:
> >
> > http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
> >
> >
> > --Hiram
> >
> > Lionel Brooks wrote:
> >> Hello all,
> >>
> >> I have a bedgraph file. In the past I have used to files to attain
> >> graphic output in the form of a smoothed line but I uploaded my most
> >> recent data set and now I cannot get a line graph. In fact, I'm not
> >> sure what I am looking at because the values that are displayed along
> >> the y-axis are not described with a label.
> >> Here is my track line:
> >> track type=bedGraph autoScale=on graphType=points
> >> windowingFunction=mean smoothingWindow=16
> >>
> >> My data format is
> >>
> >> chr coordA coordB value
> >>
> >> Where approximate distribution of data values are : 5 <= value <= 500.
> >> Is it possible that your plotting function cannot compute this line
> >> because my coordinate intervals are too small?
> >> Another possibly relevant issue may be that the coordinate intervals
> >> are not fixed length.
> >>
> >> Any suggestions for course of action would be great.
> >>
> >> Sincerely,
> >> Lionel
>
>
> ------------------------------
>
> Message: 6
> Date: Fri, 1 Apr 2011 11:18:07 -0700 (PDT)
> From: Hiram Clawson <hi...@soe.ucsc.edu>
> Subject: Re: [Genome] bedgraph data will not display points
> To: "Lionel (Lee) Brooks 3rd" <Lionel...@dartmouth.edu>
> Cc: gen...@soe.ucsc.edu
> Message-ID:
> <922254719.45877.13016...@mail-01.cse.ucsc.edu>
> Content-Type: text/plain; charset=utf-8
>
> It won't make any difference what type of wiggle format you choose.
> They all draw the same way.
>
> You are going to have to provide me with a URL to your data file
> so I can see what it looks like.
>
> --Hiram
>
> ----- Original Message -----
> From: "Lionel (Lee) Brooks 3rd" <Lionel...@dartmouth.edu>
> To: "Hiram Clawson" <hi...@soe.ucsc.edu>
> Cc: gen...@soe.ucsc.edu
> Sent: Friday, April 1, 2011 11:16:42 AM
> Subject: Re: [Genome] bedgraph data will not display points
>
> Hi Hiram,
>
> >From
> http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
>
> 1. Pseudo /line graphs/ can be drawn with the wiggle tracks by
> setting optional drawing parameters in the display of the track to
> draw /points/ instead of bars with smoothing on to smear the
> points together into a line.
>
> The pseudo line graph functionality is what I desire.
> Previously, it had been possible to do this with bedgraph format files.
> I don't know what "smearing" means. I'm just looking for a quick way to
> draw the moving average as I had been able to do before.
> As I mentioned below, my track line is:
> track type=bedGraph autoScale=on graphType=points windowingFunction=mean
> smoothingWindow=16
>
> I suppose my solution is to modify my scripts to use the wiggle variable
> step format?
>
>
> thanks,
> -Lionel
>
>
> Hiram Clawson wrote:
> > Good Morning Lionel:
> >
> > The bedGraph drawing mechanism can construct bar graphs at your
> > specified intervals, or when you select graphType=points it will
> > draw only the top of the bar graph at your specified intervals.
> > There is no line drawing except by the trick of "smoothing" points
> > such that they appear to be in a line graph. This only works if
> > the data points are continuous when seen in the genome browser.
> > Smoothing will not smear points into areas where there is no
> > data value specified.
> >
> > The Genome Graphs function of the genome browser:
> > http://genome.ucsc.edu/cgi-bin/hgGenome
> > will only draw lines between your specified points.
> >
> > See also:
> >
> > http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
> >
> >
> > --Hiram
> >
> > Lionel Brooks wrote:
> >> Hello all,
> >>
> >> I have a bedgraph file. In the past I have used to files to attain
> >> graphic output in the form of a smoothed line but I uploaded my most
> >> recent data set and now I cannot get a line graph. In fact, I'm not
> >> sure what I am looking at because the values that are displayed along
> >> the y-axis are not described with a label.
> >> Here is my track line:
> >> track type=bedGraph autoScale=on graphType=points
> >> windowingFunction=mean smoothingWindow=16
> >>
> >> My data format is
> >>
> >> chr coordA coordB value
> >>
> >> Where approximate distribution of data values are : 5 <= value <= 500.
> >> Is it possible that your plotting function cannot compute this line
> >> because my coordinate intervals are too small?
> >> Another possibly relevant issue may be that the coordinate intervals
> >> are not fixed length.
> >>
> >> Any suggestions for course of action would be great.
> >>
> >> Sincerely,
> >> Lionel
>
>
> ------------------------------
>
> Message: 7
> Date: Fri, 1 Apr 2011 14:34:07 -0400
> From: "Tom Traut" <tr...@med.unc.edu>
> Subject: [Genome] protein families
> To: gen...@soe.ucsc.edu
> Message-ID: <p06240804c9bbcac80186@[152.19.36.114]>
> Content-Type: text/plain; charset=us-ascii; format=flowed
>
> Can I use your site (or any other) to find a listing of major protein families?
>
> how many kinases
> how many proteases
> how many G proteins
>
> etc
> --
> Tom Traut
>
> Professor of Biochemistry & Biophysics
>
> Phone: 919 966-5044
> FAX: 919 966-2852
> URL: www.unc.edu/~traut
>
>
>
> ------------------------------
>
> _______________________________________________
> Genome maillist - Gen...@soe.ucsc.edu
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>
>
> End of Genome Digest, Vol 99, Issue 3
> *************************************

Greg Roe

unread,
Apr 11, 2011, 5:59:54 PM4/11/11
to janeela khan, UCSC genome
Hi Janeela,

You can use the Table Browser again. Select (Clade/Genome/Assembly)
Mammal/Pig/susScr2 and:

group: Gene and Gene prediction tracks
track: Ensebl Genes (or N-Scan Genes)
table: ensGene (or nscanGene)
region: genome
identifiers (names/accessions): click on "paste list" and paste in the identifiers following the instructions.
output format: sequence
Click get output

Select sequence type: genomic
Click Submit


On the sequence retrieval options page, make sure to uncheck the Introns
box. Other than that the defaults should work for you - just read down
the list to make sure, then click Ge Sequence.

If you have further questions, please contact us at gen...@soe.ucsc.edu

-
Greg Roe
UCSC Genome Bioinformatics Group
>> Professor of Biochemistry& Biophysics
>>
>> Phone: 919 966-5044
>> FAX: 919 966-2852
>> URL: www.unc.edu/~traut
>>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> Genome maillist - Gen...@soe.ucsc.edu
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>
>>
>> End of Genome Digest, Vol 99, Issue 3
>> *************************************
>
Reply all
Reply to author
Forward
0 new messages