Hello,
hgGcPercent would be a good choice although you will need to do some
simple programming to finalize the calculations. The file you mention
(hg18.gc5Base.txt) is one type of output from this script, so it will
not be your starting place. You will start with your BED file of genomic
regions and the genomic sequence itself.
Input: 3 column BED file (strand is ignored).
Output: 5 column BED file where column 4 is the actual count of g/c
bases in the region. This is where you would need to write a quick
script to get the length from columns 2 and 3, then calculate % from there.
Example:
$ hgGcPercent -bedRegionIn=test.in.bed -bedRegionOut=test.in.bed.out
-noLoad hg18 /gbdb/hg18/nib
$ head -2 test.in.bed*
==> test.in.bed <==
chr1 1115 4121 1115 1115 +
chr1 1115 4272 1115 1115 +
==> test.in.bed.out <==
chr1 1115 4121 1680 gcCount
chr1 1115 4272 1764 gcCount
You will need the .2bit file from our Downloads area (by ftp - this is
the /gbdb/hg18/nib in my example).
You will need to download the whole source tree and have MySQL client
libraries installed. You do not have to build the entire tree -- after
completing steps 1 through 4 in kent/src/product/README.building.source,
the minimum you need to do is:
cd hg/makeDb/hgGcPercent
make
Instructions for obtaining UCSC Browser code source:
http://genome.ucsc.edu/FAQ/FAQdownloads#download27
List of our utilities:
http://genomewiki.cse.ucsc.edu/index.php/Kent_source_utilities
Location for downloading data:
http://hgdownload.cse.ucsc.edu/downloads.html
MySQL client libraries are free and available from many web sources.
If these seems overly complicated and you do not plan on using our other
utilities, another option is to load your BED file into the Table
Browser (using region: define region) and extract the fasta sequence
from the genomic (using the Mapping and Sequencing Tracks, Assembly,
table = gold) with named, downloaded output. A fairly simple script
could move through each sequence, pop off the bases, and do the
calculations.
I hope this gives you some choices that will work for you,
Jennifer Jackson
UCSC Genome Bioinformatics Group
> _______________________________________________
> Genome maillist -
Gen...@soe.ucsc.edu
>
http://www.soe.ucsc.edu/mailman/listinfo/genome
>