RE: [genome] Digest for genome@soe.ucsc.edu - 11 updates in 6 topics

39 views

Skip to first unread message

Ingrid B

unread,

Feb 9, 2015, 12:55:09 PM2/9/15

to gen...@soe.ucsc.edu

Dear Sir or Madam,

My aim is to create a custom track with pathogenic mutations in the OFD1-gen with those mutations leading to Joubert-syndrome in one line, those for RP23 in a second, those for OFD1-Syndrom in third and for Simpson-Golabi-Behmel-Syndrome 2 in a fourth.
To create the custom-track for mutations in the OFD1-gene I downloaded the table of ClinVar (without CNVs and only the pathogenic ones).
I tried to reload this table (from the UCSC, attached) as a custom-track, but it did not work, I think because the format of the mutation nomenclature was not recognized.
The error-message which I got was the following one:

Error File 'OFD1_only_Clinvar05022015_only selected pathogenic_customtrack.txt' - Unrecognized format line 6 of file: chrX 13753396 13753398 delAG . 0 NM_003611.2(OFD1):c.43_44delAG (p.Gln16Argfs) deletion 8481 OFD1 Pathogenic 312262806 RCV000034023 GeneReviews:NBK1188,MedGen:C1510460,OMIM:311200,Orphanet:ORPHA2750,SNOMED CT:52868006 not provided classified by single submitter 1 (note: chrom names are case sensitive, e.g.: correct: 'chr1', incorrect: 'Chr1', incorrect: '1')

Can you help me with this?

Thank you very much in advance for your help.

Ingrid Bader

Dr. med. Ingrid Bader

Fachärztin für Humangenetik

M.Sc. Bioinformatik

Klinische Genetik der

Universitätsklinik für Kinder- und Jugendheilkunde

Gemeinnützige Salzburger Landeskliniken Betriebs. GmbH

Paracelsus Medizinische Universität

Müllner Hauptstraße 48

A-5020 Salzburg

Tel.: +43-662-4482-58788

Sekretariat Genetik: +43(0)662 4482-2605

Fax: +43(0)662 4482-2621

Email: i.b...@salk.at | www.salk.at

To: gen...@soe.ucsc.edu
From: gen...@soe.ucsc.edu
Subject: [genome] Digest for gen...@soe.ucsc.edu - 11 updates in 6 topics
Date: Sat, 7 Feb 2015 17:06:25 +0000

gen...@soe.ucsc.edu

Google Groups

Topic digest
View all topics

Question for converting format of file - 1 Update
downloading all ENCODE transcription factor binding data from a certain region of the genome - 2 Updates
BLAT mapping results - 2 Updates
reciprocal best hit - 1 Update
The wig files - 4 Updates
BED scores - 1 Update

Question for converting format of file

Jonathan Casper <jca...@soe.ucsc.edu>: Feb 06 05:06PM -0800

Hello Ajay,

There are a lot of data in your .xls file, and it's not immediately clear
which pieces you are trying to graph. Are you trying to plot the values in
the "logRatio" row against the names and positions that appear in the
SystematicName row? If so, the spreadsheet program that you are using to
read the .xls file probably has a "graph" (or "chart") tool that will be
much easier for you to use. The bedGraph format is generally used for
graphing a value that changes along a single chromosome, not for comparing
different chromosomes. We do have a Genome Graphs tool (
http://genome.ucsc.edu/cgi-bin/hgGenome) that displays data on multiple
chromosomes at once, but it is also designed to show values that change
over the span of each chromosome. Genome Graphs would not be the right tool
when associating a single unchanging value with each chromosome.

If your problem instead is that you are unable to read the .xls file, we
suggest that you search online for software that will allow you to open it.
Without making any particular recommendations, among the software that will
do this are Microsoft Excel, LibreOffice, and Google Docs (an online
service).

You may also be interested in posting your question on a more general
bioinformatics discussion site like https://www.biostars.org/. This mailing
list is devoted to questions regarding the use of the UCSC Genome Browser
and its tools; questions on how to interpret TCGA data are a bit outside of
our scope.

I hope this is helpful. If you have any further questions, please reply to
gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those
addresses will be archived in publicly-accessible forums for the benefit of
other users. If your question contains sensitive data, you may send it
instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group

downloading all ENCODE transcription factor binding data from a certain region of the genome

Eric Foss <ef...@fredhutch.org>: Feb 06 02:30PM -0800

Dear UCSC Genome Browser,

I would like to download all of the ENCODE transcription factor binding data in the vicinity of a gene I’m interested in. Is this possible with the Table Browser or some other UCSC Genome Browser tool?

Thank you.

Eric

Brian Lee <bria...@soe.ucsc.edu>: Feb 06 04:43PM -0800

Dear Eric,

Thank you for using the UCSC Genome Browser and your question about
downloading all of the ENCODE transcription factor binding data in the
vicinity of a gene of interest.

You can use the Table Browser to access this information. For example, here
is a session where an example region of interest is highlighted near the
start of a gene, SIRT1. Below are steps to acquire the Transcription Factor
data for this region, chr10:69,637,000-69,639,000.

http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Brian%20Lee&hgS_otherUserSessionName=hg19.SIRT1.TFBS.NFYB

First go to the Table Browser, http://genome.ucsc.edu/cgi-bin/hgTables and
set the "Group:" to "Regulation", the "track:" to "Txn Factor ChIP" and the
"table:" to "wgEncodeRegTfbsClusteredV3". Then select "position" and enter
coordinates of interest: chr10:69,637,000-69,639,000. By clicking "get
output" you will see the following output:

#bin chrom chromStart chromEnd name score expCount
expNums expScores
1116 chr10 69637728 69638048 NFYB 154 1 517 154
1116 chr10 69637926 69638250 CEBPB 430 4
212,343,426,477 275,262,218,430
1116 chr10 69638035 69638275 USF1 139 1 161 139

The chrom, chromStart, and chromEnd fields give the regions where named
transcription factors like NFYB have been seen, and the score gives a
relative indication of the strength of the signal seen in experiments,
while expCount indicates the number of experiments binding has been
observed.

What you will notice in this session is that this
wgEncodeRegTfbsClusteredV3 represents a processed summarized condensation
of hundreds of ChIP-seq experiments. If you are interested in looking
deeper into the underlying files that produced the clustered summary, you
can click the boxes, such as the one for NFYB, and then click the
"metadata" link for "more info". There you will see the lab, antibody, and
cell type and the uniform processed peak track,
wgEncodeAwgTfbsSydhK562NfybUniPk, that was used in a clustering algorithm
to generate the clusters track.

You will also see a UCSC Accession, wgEncodeEH002024, when looking at
metadata details of cluster items. You can use the accessions like
wgEncodeEH002024 to also find the underlying raw signal track, if desired.
For example, with the Track Search tool,
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hgt_tSearch=1&tsCurTab=simpleTab&tsSimple=wgEncodeEH002024,
you click search and then click a similar blue metaData arrows next to the
"K562 NF-YB Standard ChIP-seq Signal from ENCODE/SYDH " line to see a
displayed "fileName" such "wgEncodeSydhTfbsK562NfybStdSig.bigWig", which
you can download for the entire genome. Conversely, if you want this signal
data for only your region, you can return to the Table Browser, set the
"Group" to "All Tables", "table:" "wgEncodeSydhTfbsK562NfybStdSig" and
"position:" chr10:69635813-69645132 and "get output" as "data points".

In summary, the Table Browser output from wgEncodeRegTfbsClusteredV3 (
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRegTfbsClusteredV3)
provides a processed clustered coordinate condensation of hundreds of
uniformly processed ChIP-seq files (see
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeAwgTfbsUniform for
details), that were in turn generated from separate laboratories for
various cell lines (
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeTfBindingSuper).
To read more about the background of these data sources please see the
related Track Description Pages in this paragraph.

Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please reply to gen...@soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group

BLAT mapping results

Pat Hartz <pha...@peas.welch.jhu.edu>: Feb 06 02:59PM -0500

I have received ‘funny’ results of BLAT mapping on several occasions and I want to understand what the results mean.

For example, I BLAT’d the sequence of MIR22-5p (AGUUCUUCAGUGGCAAGCUUUA) and, in addition to the results that nailed the mapping to reference ch17:1713952-1713973, I received a result that included “17_KI270867v1 alt”.

How do I interpret the second result?

Thank you,

Pat Hartz

Patricia A. Hartz, PhD
Science Writer, OMIM
(www.omim.org)
Institute of Genetic Medicine
Johns Hopkins University

Matthew Speir <msp...@soe.ucsc.edu>: Feb 06 03:04PM -0800

Hi Pat,

Thank you for your question about your BLAT results. The chr*_alt
chromosomes are alternative sequences for different regions in the human
genome. You can read more about them on the GRCh38/hg38 Gateway page,
http://genome.ucsc.edu/cgi-bin/hgGateway?db=hg38, under the section
titled " GRCh38 Highlights". You can also find more information on these
alternate loci on the Genome Reference Consortium (GRC) website:
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/info/definitions.shtml#ALTERNATE.
You can see what regions these alternate sequences correspond to in the
genome by using the "Alt Map Super-track",
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=altSequence.

I hope this is helpful. If you have any further questions, please reply
to gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group

On 2/6/15 11:59 AM, Pat Hartz wrote:

reciprocal best hit

Matthew Speir <msp...@soe.ucsc.edu>: Feb 06 02:24PM -0800

Hi Anaïs,

Thank you for your question about . You can look at the following
GenomeWiki page for some sample scripts that you can use to create your
own Reciprocal Best or Syntenic Net file:
http://genomewiki.ucsc.edu/index.php/HowTo:_Syntenic_Net_or_Reciprocal_Best.
You should be able to find any of the utilities referenced in those
scripts on our download server at
http://hgdownload.soe.ucsc.edu/downloads.html under the appropriate
folder for your machine.

If you are ever curious about what a particular UCSC Genome Browser
utility does, you can always run it on the command line without any
arguments to see that usage message. For example, if you run
chainStichId without any arguments, you should see the following usage
message:

chainStitchId - Join chain fragments with the same chain ID into a
single
chain per ID. Chain fragments must be from same original chain but
must not overlap. Chain fragment scores are summed.
usage:
chainStitchId in.chain out.chain

I hope this is helpful. If you have any further questions, please reply
to gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group

On 2/5/15 9:00 AM, Anaïs Gouin wrote:

The wig files

Da-Peng Wang <wang...@gmail.com>: Feb 06 06:26PM

Dear Colleague,

I intend to convert BAM files to WIG files for UCSC genome browser as
we don't have webserver to store bigwig files at the moment.

Would you like to help me how to make the wig files that can be used in UCSC?

Thank you in advance,

Dapeng

Hiram Clawson <hi...@soe.ucsc.edu>: Feb 06 10:52AM -0800

Good Morning Dapeng:

There are many procedures to construct such files from BAM files.
A google search for this procedure will find many such examples.

Here is another example, using the bedtools 'bamToBed' operation
and kent source commands:

bamToBed -i yourFile.bam | cut -f1-3 | sort -k1,1 -k2,2n \
bedItemOverlapCount -chromSize=yourGenome.chrom.sizes test stdin \
| sort -k1,1 -k2,2n > yourFile.bedGraph
bedGraphToBigWig yourFile.bedGraph yourGenome.chrom.sizes yourFile.bw

--Hiram

On 2/6/15 10:26 AM, Da-Peng Wang wrote:

Hiram Clawson <hi...@soe.ucsc.edu>: Feb 06 12:45PM -0800

Good Afternoon Da-Peng:

If your files are large, the upload is inefficient and will probably
not work. You can load tiny wiggle files, a few thousand data points
at most, but they are volatile and will disappear. I would recommend
finding a WEB hosting service where you can supply large files with
a URL. Please review your options for graphing data:
http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
http://genome.ucsc.edu/goldenPath/help/bigWig.html
http://genome.ucsc.edu/goldenPath/help/wiggle.html

--Hiram

On 2/6/15 12:37 PM, Da-Peng Wang wrote:

Da-Peng Wang <wang...@gmail.com>: Feb 06 08:37PM

Hi Hiram,

Thank you for your reply.

But we don't have webserver to store bigwig files and
"bedGraphToBigWig" is unable to generate the WIG file (not bigwig). We
hope to upload the wig file to UCSC browser from PC.

Could you please help me more?

Thanks,

Regards,

Dapeng

BED scores

Brian Lee <bria...@soe.ucsc.edu>: Feb 06 09:43AM -0800

Dear Elisabetta,

Thank you for using the UCSC Genome Browser and your question about
ChIP-seq ENCODE scores.

You are correct to think of interpreting the darker score as increased
biological evidence of binding of that transcription factor at that
particular spot. Here is a session that displays the Clustered
Transcription Factor Binding Sites track (wgEncodeRegTfbsClusteredV3), and
the underlying Uniform Peaks track (wgEncodeAwgTfbsUniform) used to create
the clusters, produced by the ENCODE Analysis Working Group.

http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Brian%20Lee&hgS_otherUserSessionName=hg19.TFBS.Uniform.Cluster

In summary the AWG created a pipeline to uniformly processes several
hundred ChIP-seq files generated by the ENCODE project. That uniform
processing resulted in a comparable signal scores viewable in the
wgEncodeAwgTfbsUniform track, that was then used to generate the clustered
score in the wgEncodeRegTfbsClusteredV3 track, where a normalization factor
was used to attempt to better distribute scores evenly.

In the above session just the factors JUN, JUNB, JUND, and MYC have been
filtered to display. You can see how MYC has a dark score and has several
letters following the block, indicating all the cell types where binding of
MYC has been observed. If you click into the box for the MYC cluster and
you will see the list of assays where evidence shows there is binding.

Returning to the Browser display you can see several individual "Uniform
...c-Myc" tracks displayed below the clusters track. Those are the separate
wgEncodeAwgTfbsUniform tracks used to generate the processed clustered
summary wgEncodeRegTfbsClusteredV3 track for this MYC cluster. Those
individual uniform processed scores were used to create the cluster score
given to the the MYC cluster. Like the MYC factor, you can also click the
JUN factors and you will see there is only one observed cell type where
this data indicates this factor binds at this location. And similarly
below, you will see the "Uniform... Jun" tracks that contributed to the
clusters track.

Also note that some of the transcription factors, like MYC, also have
additional Factorbook motif information available to display, you can read
more about that in the wgEncodeRegTfbsClusteredV3 track description.

For a complete understanding of how the scores were calculated you must
read the Track Description pages for these two tracks.

See Methods section
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRegTfbsClusteredV3

See Methods section and Peak Calling:
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeAwgTfbsUniform

If you have more questions after reviewing the track description pages
about how the score is calculated, I suggested reviewing our mailing list
archive of previously answered questions:
https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!searchin/genome/score$20tfbs

Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please reply to gen...@soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+un...@soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

OFD1_only_Clinvar05022015_only selected pathogenic_customtrack.txt

Steve Heitner

unread,

Feb 10, 2015, 1:56:35 PM2/10/15

to Ingrid B, gen...@soe.ucsc.edu

Hello, Ingrid.

The ClinVar Variants track is based on a bigBed file. I assume you obtained your text file by downloading the contents of the ClinVar Variants track via the Table Browser. The problem here is that the bigBed file that the ClinVar Variants track is based on was created from a non-standard BED file created using an AutoSql definition file. Because of this, when you obtain the original BED data and try to load it as a custom track, it will not work.

If you would like to create your own custom track based on the format of the ClinVar Variants track, the best thing to do would be to create your own AutoSql definition file (see Example Three at http://genome.ucsc.edu/goldenPath/help/bigBed.html) and use the bedToBigBed utility (http://hgdownload.cse.ucsc.edu/admin/exe/) to create a bigBed file. I will also attach the clinvar.as file here for you.

Please contact us again at gen...@soe.ucsc.edu if you have any further questions. Questions sent to that address will be archived in a publicly-accessible forum for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

---
Steve Heitner
UCSC Genome Bioinformatics Group