Thank you so very much. It was very useful information for me. I wonder if I can also retrieve the exonic sequence for the pig genome?> From:
genome-...@soe.ucsc.edu
> Subject: Genome Digest, Vol 99, Issue 3
> To:
gen...@soe.ucsc.edu
> Date: Fri, 1 Apr 2011 12:00:12 -0700
>
> Send Genome mailing list submissions to
>
gen...@soe.ucsc.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
https://lists.soe.ucsc.edu/mailman/listinfo/genome
> or, via email, send a message with subject or body 'help' to
>
genome-...@soe.ucsc.edu
>
> You can reach the person managing the list at
>
genome...@soe.ucsc.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Genome digest..."
>
>
> Today's Topics:
>
> 1. Re: help: Exonic position map to Protein position
> (Vanessa Kirkup Swing)
> 2. Re: How to generate mapping between Ensembl and refseq
> transcript IDs (Hiram Clawson)
> 3. Re: bedgraph data will not display points (Hiram Clawson)
> 4. Re: when is a query excessive. (Galt Barber)
> 5. Re: bedgraph data will not display points
> (Lionel (Lee) Brooks 3rd)
> 6. Re: bedgraph data will not display points (Hiram Clawson)
> 7. protein families (Tom Traut)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 1 Apr 2011 09:38:36 -0700 (PDT)
> From: Vanessa Kirkup Swing <
van...@soe.ucsc.edu>
> Subject: Re: [Genome] help: Exonic position map to Protein position
> To: janeela khan <
janeel...@hotmail.com>
> Cc: UCSC genome <
gen...@soe.ucsc.edu>
> Message-ID:
> <
779613075.45344.13016...@mail-01.cse.ucsc.edu>
> Content-Type: text/plain; charset=utf-8
> ------------------------------
>
> Message: 2
> Date: Fri, 01 Apr 2011 09:41:00 -0700
> From: Hiram Clawson <
hi...@soe.ucsc.edu>
> Subject: Re: [Genome] How to generate mapping between Ensembl and
> refseq transcript IDs
> To: "Cook, Malcolm" <
M...@stowers.org>
> Cc: "'Rajasimha, Harsha \(NIH/NEI\) \[C\]'" <
rajas...@nei.nih.gov>,
> "'
gen...@soe.ucsc.edu'" <
gen...@soe.ucsc.edu>
> Message-ID: <
4D96001C...@soe.ucsc.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Sorry Malcolm, there isn't a generic method for all genomes at UCSC.
> This is a most interesting example you have here. Usually chrM at
> Ensembl is: "Mt"
>
> Newer genome assemblies at UCSC are including two tables:
> ensemblLift
> ucscToEnsembl
>
> Which allow translation of UCSC names to Ensembl names and
> coordinate conversions for haplotypes and other random bits that
> might be located in a different coordinate system. For example:
>
> $ hgsql -e "select * from ensemblLift;" hg19
> +-----------------+----------+
> | chrom | offset |
> +-----------------+----------+
> | HSCHR4_1 | 69170076 |
> | HSCHR17_1 | 43384863 |
> | HSCHR6_MHC_APD | 28696603 |
> | HSCHR6_MHC_COX | 28477796 |
> | HSCHR6_MHC_DBB | 28696603 |
> | HSCHR6_MHC_MANN | 28696603 |
> | HSCHR6_MHC_MCF | 28696603 |
> | HSCHR6_MHC_QBL | 28696603 |
> | HSCHR6_MHC_SSTO | 28659142 |
> +-----------------+----------+
>
> $ hgsql -e "select * from ucscToEnsembl;" hg19 | grep MHC
> chr6_ssto_hap7 HSCHR6_MHC_SSTO
> chr6_qbl_hap6 HSCHR6_MHC_QBL
> chr6_mcf_hap5 HSCHR6_MHC_MCF
> chr6_mann_hap4 HSCHR6_MHC_MANN
> chr6_cox_hap2 HSCHR6_MHC_COX
> chr6_dbb_hap3 HSCHR6_MHC_DBB
> chr6_apd_hap1 HSCHR6_MHC_APD
>
> It would be a useful process to go back over some of the older popular
> genomes to add these conversion tables.
>
> --Hiram
>
> Cook, Malcolm wrote:
> > Hiram,
> >
> > Is there a similar approach for chromosomal identifiers? (i.e. chrM in dm3 is dmel_mitochondrion_genome at ensemble)
> >
> > Or better, an SQL query for same?
> >
> > Thx
> >
> > Malcolm Cook
> > Stowers Institute for Medical Research - Bioinformatics
> > Kansas City, Missouri USA
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 01 Apr 2011 09:49:16 -0700
> From: Hiram Clawson <
hi...@soe.ucsc.edu>
> Subject: Re: [Genome] bedgraph data will not display points
> To: Lionel Brooks <
Lionel...@dartmouth.edu>
> Cc:
gen...@soe.ucsc.edu
> Message-ID: <
4D96020C...@soe.ucsc.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Good Morning Lionel:
>
> The bedGraph drawing mechanism can construct bar graphs at your
> specified intervals, or when you select graphType=points it will
> draw only the top of the bar graph at your specified intervals.
> There is no line drawing except by the trick of "smoothing" points
> such that they appear to be in a line graph. This only works if
> the data points are continuous when seen in the genome browser.
> Smoothing will not smear points into areas where there is no
> data value specified.
>
> The Genome Graphs function of the genome browser:
>
http://genome.ucsc.edu/cgi-bin/hgGenome
> will only draw lines between your specified points.
>
> See also:
>
>
http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
>
> --Hiram
>
> Lionel Brooks wrote:
> > Hello all,
> >
> > I have a bedgraph file. In the past I have used to files to attain
> > graphic output in the form of a smoothed line but I uploaded my most
> > recent data set and now I cannot get a line graph. In fact, I'm not
> > sure what I am looking at because the values that are displayed along
> > the y-axis are not described with a label.
> >
> > Here is my track line:
> > track type=bedGraph autoScale=on graphType=points windowingFunction=mean
> > smoothingWindow=16
> >
> > My data format is
> >
> > chr coordA coordB value
> >
> > Where approximate distribution of data values are : 5 <= value <= 500.
> > Is it possible that your plotting function cannot compute this line
> > because my coordinate intervals are too small?
> > Another possibly relevant issue may be that the coordinate intervals are
> > not fixed length.
> >
> > Any suggestions for course of action would be great.
> >
> > Sincerely,
> > Lionel
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 01 Apr 2011 10:25:14 -0700
> From: Galt Barber <
ga...@soe.ucsc.edu>
> Subject: Re: [Genome] when is a query excessive.
> To: John Hayward <
john.h...@wheaton.edu>
> Cc: "
gen...@soe.ucsc.edu" <
gen...@soe.ucsc.edu>
> Message-ID: <
4D960A7A...@soe.ucsc.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>
> Hi, John!
>
> Queries that take more than a few minutes to run are
> probably inappropriate for the shared public mysql server.
>
> I found this query formulation for you that takes less than one minute:
>
> select name, observed, count(*) from(
> (select name, observed, 'CEU' from hapmapSnpsCEU where chrom = 'Chr16')
> union
> (select name, observed, 'YRI' from hapmapSnpsYRI where chrom = 'Chr16')
> union
> (select name, observed, 'CHB' from hapmapSnpsCHB where chrom = 'Chr16')
> union
> (select name, observed, 'JPT' from hapmapSnpsJPT where chrom = 'Chr16')
> ) resultAlias group by name, observed having count(*) = 4 limit 30;
>
> +-----------+----------+----------+
> | name | observed | count(*) |
> +-----------+----------+----------+
> | rs1000014 | A/G | 4 |
> | rs1000047 | C/T | 4 |
> | rs1000077 | C/G | 4 |
> | rs1000078 | A/G | 4 |
> | rs1000100 | A/T | 4 |
> | rs1000174 | A/G | 4 |
> | rs1000178 | C/T | 4 |
> | rs1000192 | A/G | 4 |
> | rs1000193 | A/C | 4 |
> | rs1000454 | C/G | 4 |
> | rs1000455 | A/T | 4 |
> | rs1000710 | G/T | 4 |
> | rs1000711 | C/G | 4 |
> | rs1000720 | A/G | 4 |
> | rs1000742 | C/T | 4 |
> | rs1001157 | A/G | 4 |
> | rs1001170 | G/T | 4 |
> | rs1001171 | A/T | 4 |
> | rs1001302 | A/G | 4 |
> | rs1001362 | C/T | 4 |
> | rs1001366 | C/T | 4 |
> | rs1001493 | C/T | 4 |
> | rs1001554 | A/G | 4 |
> | rs1001608 | C/T | 4 |
> | rs1001631 | C/G | 4 |
> | rs1001655 | A/G | 4 |
> | rs1001722 | G/T | 4 |
> | rs1001776 | C/T | 4 |
> | rs1001861 | A/G | 4 |
> | rs1001871 | C/G | 4 |
> +-----------+----------+----------+
> 30 rows in set (46.17 sec)
>
> Of course for your own full output,
> you would remove the "limit" clause.
>
> In case you are curious how many there are:
>
> select count(*) from (
> select name, observed, count(*) from(
> (select name, observed, 'CEU' from hapmapSnpsCEU where chrom = 'Chr16')
> union
> (select name, observed, 'YRI' from hapmapSnpsYRI where chrom = 'Chr16')
> union
> (select name, observed, 'CHB' from hapmapSnpsCHB where chrom = 'Chr16')
> union
> (select name, observed, 'JPT' from hapmapSnpsJPT where chrom = 'Chr16')
> ) resultAlias group by name, observed having count(*) = 4) resultAlias2;
> +----------+
> | 105841 |
> +----------+
> 1 row in set (47.60 sec)
>
>
> Another alternative would be to capture the output from each like this:
>
> select name, observed, 'CEU' from hapmapSnpsCEU where chrom = 'Chr16'
>
> for each of your 4 files.
> You could sort them by name (rsId) either with an order by clause in
> sql, or with the unix sort command.
>
> You can even use the unix join command to join them up on the name and
> observed fields.
>
> Once the contents of each of the 4 sets are sorted by name and observed,
> joining them can be very fast.
>
> -Galt
>
> 4/1/2011 8:25 AM, John Hayward:
> > I would like to run queries against the
genome-mysql.cse.ucsc.edu database which may be excessive and don't want to cause problems for others.
> >
> > I want to find matches for a particular chromosome which have the same name and observation for tables hapmapSnpsCEU, haphapmapSnpsYRI, mapSnpsCHB, hapmapSnpsJPT.
> >
> > Doing a query to pickup the count of hapmapSnpsCEU for one chromosome took 0.14 seconds.
> > If I do a query to pick up the count joining hapmapSnpsCEU and hapmapSnpCHB took 8.40 seconds.
> >
> > If I join all tables would that constitute an excessive load?
> >
> > Below is the query joining two tables.
> > ======
> > select count(*) from hapmapSnpsCEU, hapmapSnpsCHB where hapmapSnpsCEU.chrom = 'Chr16' and hapmapSnpsCHB.chrom = 'Chr16' and hapmapSnpsCEU.name = hapmapSnpsCHB.name and hapmapSnpsCEU.observed = hapmapSnpsCHB.observed;
> > ======
> > johnh...
> ------------------------------
>
> Message: 5
> Date: Fri, 01 Apr 2011 14:16:42 -0400
> From: "Lionel (Lee) Brooks 3rd" <
Lionel...@dartmouth.edu>
> Subject: Re: [Genome] bedgraph data will not display points
> To: Hiram Clawson <
hi...@soe.ucsc.edu>
> Cc:
gen...@soe.ucsc.edu
> Message-ID: <
4D96168A...@dartmouth.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi Hiram,
>
> >From
>
http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
>
> 1. Pseudo /line graphs/ can be drawn with the wiggle tracks by
> setting optional drawing parameters in the display of the track to
> draw /points/ instead of bars with smoothing on to smear the
> points together into a line.
>
> The pseudo line graph functionality is what I desire.
> Previously, it had been possible to do this with bedgraph format files.
> I don't know what "smearing" means. I'm just looking for a quick way to
> draw the moving average as I had been able to do before.
> As I mentioned below, my track line is:
> track type=bedGraph autoScale=on graphType=points windowingFunction=mean
> smoothingWindow=16
>
> I suppose my solution is to modify my scripts to use the wiggle variable
> step format?
>
>
> thanks,
> -Lionel
>
>
> Hiram Clawson wrote:
> > Good Morning Lionel:
> >
> > The bedGraph drawing mechanism can construct bar graphs at your
> > specified intervals, or when you select graphType=points it will
> > draw only the top of the bar graph at your specified intervals.
> > There is no line drawing except by the trick of "smoothing" points
> > such that they appear to be in a line graph. This only works if
> > the data points are continuous when seen in the genome browser.
> > Smoothing will not smear points into areas where there is no
> > data value specified.
> >
> > The Genome Graphs function of the genome browser:
> >
http://genome.ucsc.edu/cgi-bin/hgGenome
> > will only draw lines between your specified points.
> >
> > See also:
> >
> >
http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
> >
> >
> > --Hiram
> >
> > Lionel Brooks wrote:
> >> Hello all,
> >>
> >> I have a bedgraph file. In the past I have used to files to attain
> >> graphic output in the form of a smoothed line but I uploaded my most
> >> recent data set and now I cannot get a line graph. In fact, I'm not
> >> sure what I am looking at because the values that are displayed along
> >> the y-axis are not described with a label.
> >> Here is my track line:
> >> track type=bedGraph autoScale=on graphType=points
> >> windowingFunction=mean smoothingWindow=16
> >>
> >> My data format is
> >>
> >> chr coordA coordB value
> >>
> >> Where approximate distribution of data values are : 5 <= value <= 500.
> >> Is it possible that your plotting function cannot compute this line
> >> because my coordinate intervals are too small?
> >> Another possibly relevant issue may be that the coordinate intervals
> >> are not fixed length.
> >>
> >> Any suggestions for course of action would be great.
> >>
> >> Sincerely,
> >> Lionel
>
>
> ------------------------------
>
> Message: 6
> Date: Fri, 1 Apr 2011 11:18:07 -0700 (PDT)
> From: Hiram Clawson <
hi...@soe.ucsc.edu>
> Subject: Re: [Genome] bedgraph data will not display points
> To: "Lionel (Lee) Brooks 3rd" <
Lionel...@dartmouth.edu>
> Cc:
gen...@soe.ucsc.edu
> Message-ID:
> <
922254719.45877.13016...@mail-01.cse.ucsc.edu>
> Content-Type: text/plain; charset=utf-8
>
> It won't make any difference what type of wiggle format you choose.
> They all draw the same way.
>
> You are going to have to provide me with a URL to your data file
> so I can see what it looks like.
>
> --Hiram
>
> ----- Original Message -----
> From: "Lionel (Lee) Brooks 3rd" <
Lionel...@dartmouth.edu>
> To: "Hiram Clawson" <
hi...@soe.ucsc.edu>
> Cc:
gen...@soe.ucsc.edu
> Sent: Friday, April 1, 2011 11:16:42 AM
> Subject: Re: [Genome] bedgraph data will not display points
>
> Hi Hiram,
>
> >From
>
http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
>
> 1. Pseudo /line graphs/ can be drawn with the wiggle tracks by
> setting optional drawing parameters in the display of the track to
> draw /points/ instead of bars with smoothing on to smear the
> points together into a line.
>
> The pseudo line graph functionality is what I desire.
> Previously, it had been possible to do this with bedgraph format files.
> I don't know what "smearing" means. I'm just looking for a quick way to
> draw the moving average as I had been able to do before.
> As I mentioned below, my track line is:
> track type=bedGraph autoScale=on graphType=points windowingFunction=mean
> smoothingWindow=16
>
> I suppose my solution is to modify my scripts to use the wiggle variable
> step format?
>
>
> thanks,
> -Lionel
>
>
> Hiram Clawson wrote:
> > Good Morning Lionel:
> >
> > The bedGraph drawing mechanism can construct bar graphs at your
> > specified intervals, or when you select graphType=points it will
> > draw only the top of the bar graph at your specified intervals.
> > There is no line drawing except by the trick of "smoothing" points
> > such that they appear to be in a line graph. This only works if
> > the data points are continuous when seen in the genome browser.
> > Smoothing will not smear points into areas where there is no
> > data value specified.
> >
> > The Genome Graphs function of the genome browser:
> >
http://genome.ucsc.edu/cgi-bin/hgGenome
> > will only draw lines between your specified points.
> >
> > See also:
> >
> >
http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
> >
> >
> > --Hiram
> >
> > Lionel Brooks wrote:
> >> Hello all,
> >>
> >> I have a bedgraph file. In the past I have used to files to attain
> >> graphic output in the form of a smoothed line but I uploaded my most
> >> recent data set and now I cannot get a line graph. In fact, I'm not
> >> sure what I am looking at because the values that are displayed along
> >> the y-axis are not described with a label.
> >> Here is my track line:
> >> track type=bedGraph autoScale=on graphType=points
> >> windowingFunction=mean smoothingWindow=16
> >>
> >> My data format is
> >>
> >> chr coordA coordB value
> >>
> >> Where approximate distribution of data values are : 5 <= value <= 500.
> >> Is it possible that your plotting function cannot compute this line
> >> because my coordinate intervals are too small?
> >> Another possibly relevant issue may be that the coordinate intervals
> >> are not fixed length.
> >>
> >> Any suggestions for course of action would be great.
> >>
> >> Sincerely,
> >> Lionel
>
>
> ------------------------------
>
> Message: 7
> Date: Fri, 1 Apr 2011 14:34:07 -0400
> From: "Tom Traut" <
tr...@med.unc.edu>
> Subject: [Genome] protein families
> To:
gen...@soe.ucsc.edu
> Message-ID: <p06240804c9bbcac80186@[152.19.36.114]>
> Content-Type: text/plain; charset=us-ascii; format=flowed
>
> Can I use your site (or any other) to find a listing of major protein families?
>
> how many kinases
> how many proteases
> how many G proteins
>
> etc
> --
> Tom Traut
>
> Professor of Biochemistry & Biophysics
>
> Phone:
919 966-5044
> FAX:
919 966-2852
> URL:
www.unc.edu/~traut
>
>
>
> ------------------------------
> End of Genome Digest, Vol 99, Issue 3
> *************************************