problems with Gene retrieval, GTEx track and ChIP-Seq

53 views
Skip to first unread message

Fu weiwei

unread,
Aug 8, 2017, 11:43:23 AM8/8/17
to gen...@soe.ucsc.edu
Dear UCSC Genome Browser Team,
 
I appreciate if you can help me with the following problems.

(1) I have uploaded a bigbed file (transformed from GTF format) which was loaded into mysql. Now I can only search gene in the position box using UPPER case gene symbol (eg, GH1), but not the LOWER case (eg, gh1). So, how can I search genes in a case insensitive manner (eg. Either GH1 or gh1)?

My command line and configuration is as follows: 
bedToBigBed -type=bed12+1 -as=goat.as -extraIndex=name,geneSymbol goat.bed.new.sorted ../chrom/chrom.sizes goat.bb
hgBbiDbLink panGoat gene /gbdb/panGoat/gene/goat.bb

# vi trackDb.ra
track gene
shortLabel NCBI Genes
longLabel NCBI Gene Predictions
group genes
priority 5
visibility dense
colorByStrand 255,0,0 0,0,255
type bigBed 12 +
labelFields geneSymbol,name
defaultLabelFields geneSymbol,name
labelSeparator "  "
bedNameLabel Transcript Name
searchIndex name,geneSymbol
searchName gene
searchTable gene
searchType bigBed
mouseOverField geneSymbol,name

(2)  how to load tissue expression track on a local mirror like what you have demonstrated for the GTEx track?  I have searched all the available manual in UCSC but I couldn't find the configuration method and documentation. Could you provide me the detailed steps?


(3) Regarding the ChIP-Seq data that I have, each sample has two results including the input (control) and IP experiment. I suppose the experiment data should be first normalized by the input and then transformed into one bigwig format before uploading to UCSC. Could you tell me or give me some advice how to generate this one bigwig file based on my two results as mentioned above? 


Thank you very much for your time.


Kind regards,
Weiwei fu


发自网易邮箱大师

Brian Lee

unread,
Aug 9, 2017, 4:24:25 PM8/9/17
to Fu weiwei, gen...@soe.ucsc.edu
Dear Weiwei,

Thank you for using the UCSC Genome Browser and your questions about trackDb.

Could you share a some parts of our files (goat.bed.new.sorted file and chrom.sizes) to help replicate your issue? Could your describe your mirror situation as well, are you by chance using a Genome Browser in a Box (GBiB)? You can send the files directly to my address or our private internal address genom...@soe.ucsc.edu.

You can find some information about a hub GTEx bigBarChart files at these links:

Those links are for building GTEx-like barCharts in custom tracks and hubs.

For you ChIP-seq processing steps, you might want to look at the methods described in papers done for the ENCODE project. For example Uniform TFBS Track Description (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeAwgTfbsUniform#TRACK_HTML ) has a Methods section and a References section with papers like "Design and analysis of ChIP-seq experiments for DNA-binding proteins" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2597701/) that will likely provide some helpful input or contacts to ask questions to about this process (or you may want to also ask your question at bioinformatics sites like Biostars https://www.biostars.org/).

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genomics Institute

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/3e0ee7bc.177b.15dc1afa242.Coremail.weiweifu666%40126.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Fu weiwei

unread,
Aug 14, 2017, 11:47:55 AM8/14/17
to Brian Lee, gen...@soe.ucsc.edu, genom...@soe.ucsc.edu
Dear Brian,

Thank you for all the helpful tips. I am very sorry to sent you an email repeatedly since my attachment exceed the size of UCSC message. The third question has been resolved while the first two have not been settled yet. I checked the Genome Browser mailling list before and have realized Lower case search in position box through ixIxx program, but a new problem is emerging at the same time. For example, when I search for pparg gene, I will also retrieved ppargc1b and ppargc1a gene; and when I search for pparg gene using upper case (PPARG),  I can search PPARG twice except the previous case (PPARGC1B, PPARGC1A). Can you help me change it? I have attached all my configuration file for your review.

Another problem, according to the document you provide (http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#bigBarChart and http://genome.ucsc.edu/goldenPath/help/barChart.html), I can achieve the histogram in gbrowse, but when I click the histogram, it will appear error message containing "Can't start query: SELECT * FROM expreBar WHERE name='LSS'AND chrom='1' AND chromStart=145851416 AND chromEnd=145883591 mySQL error 1054: Unknown column 'name' in 'where clause' (profile=<noProfile>, host=localhost, db=Goat)". The URL of my UCSC Genome Browser Mirror is http://animal.nwsuaf.edu.cn/genomebrowser/cgi-bin/hgTracks?db=Goat&position=1:145,837,750-145,869,925.  I will put my profile file in the attachment. I appreciate your time and help.

Thank you,
Weiwei Fu




发自网易邮箱大师

attachment.zip

Brian Lee

unread,
Aug 25, 2017, 4:59:58 PM8/25/17
to Fu weiwei, gen...@soe.ucsc.edu

Dear Weiwei Fu,

Thank you for building barCharts and your excellent mirror. One of the engineers familiar with building barChart data for an internal track (versus in a hub where our main documentation exists) had a few suggestions.

First you can look at this example trackDb for an internal track that is present on our development server:
http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=blob;f=src/hg/makeDb/trackDb/human/hg38/barCharts.ra

Here is an excerpt:

    track tcgaTranscExpr
    parent tcgaExpr
    release alpha,beta
    type bigBarChart
    visibility full
    maxLimit 8000
    maxItems 300
    barChartLabel Cancer types
    barChartUnit TPM
    # Get the barChartLabels from the last line in the bed file.
    barChartBars \
            Adrenocortical_carcinoma .... \
    barChartColors \
            \#8FBC8F....\
    barChartMatrixUrl /gbdb/hgFixed/human/expMatrix/tcgaMatrix.tab
    barChartSampleUrl /gbdb/hgFixed/human/expMatrix/tcgaLargeSamples.tab
    barChartMetric median
    shortLabel TCGA Transc Expr
    longLabel TCGA Transcript Expression (GENCODE v23)
    defaultLabelFields name2
    labelFields name2, name
    group expression
    bigDataUrl /gbdb/hgFixed/human/expMatrix/tcgaTransExp.bb

Look to the tcgaExpr superTrack and the tcgaTranscExpr stanza for examples that you can also see interact with on our development server here (note our development site can change at any time): http://hgwdev.soe.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=tcgaTranscExpr

By comparing your trackDb.txt and the barCharts.ra you can see it would be good to make some of the following changes:

In place of using hgBbiDbLink to load your bigBed --use--> bigDataUrl /gbdb/Goat/gene/goat.bb
Add barChartColors to define your colors and barChartLabel to label.

One of the most important aspects our engineer shared is that you must ensure that there is a 1-to-1 exact name matching in the category file with the labels in the matrix file. This might be where things have gone wrong.

For example, you can't have something like GTEX-111CU-1826-SM-5GZYN in one file and gtex-111CU-1826-sm-5GZYN in the other, and all must be represented. Here is an example matrix and category file to look at:
http://genome.ucsc.edu/goldenPath/help/examples/barChart/exampleSampleData.txt
http://genome.ucsc.edu/goldenPath/help/examples/barChart/exampleMatrix.txt

For your other question about search results, it is not entirely clear about the problem you are experiencing. It sounds like you find one result twice, with the duplicate displayed item representing what is an exact match (for example PPARG, but the other PPARG is a suggested possibility along with all the other PPARGC1A/B). It sounds like this is the result of the trix file attempting to find potential matches.

Thank you again for using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genomics Institute


On Fri, Aug 25, 2017 at 10:09 AM, Fu weiwei <weiwe...@126.com> wrote:
Dear Brain,

I haven't solved the two questions in the email that I sent it to you last time. If you have enough time today, could you download the attachment I sent you last time and give me some detailed instructions?  I would appreciate it very much.

All the best,
Weiwei Fu






At 2017-08-26 00:41:41, "Brian Lee" <bria...@soe.ucsc.edu> wrote:
Dear Weiwei Fu,

I wanted to check if there have been any developments since my last email, could you send what questions you currently are having that I could help with?

Thanks,
Brian 

On Wed, Aug 16, 2017 at 4:54 PM, Brian Lee <bria...@soe.ucsc.edu> wrote:

Dear Weiwei Fu,

Thank you for your message. I wanted to share I will be away until next Thursday, but will be looking to respond to your email next week.

I was going to suggest the xIxx method, so that is good that you have started using that method. You may also find searching our archives of help in the mean time: 
https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome-mirror
https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome/

All the best,
Brian

Fu weiwei

unread,
Sep 5, 2017, 11:56:26 AM9/5/17
to Brian Lee, gen...@soe.ucsc.edu
Dear Brain,

I couldn't solve this MySQL error (Can't start query: SELECT * FROM expreBar WHERE name='LSS'AND chrom='1' AND chromStart=145851416 AND chromEnd=145883591 mySQL error 1054: Unknown column 'name' in 'where clause' (profile=<noProfile>, host=localhost, db=Goat) about building barCharts for a long time. This mirror works well except I click the barchart figure (It will appear the MySQL error I described above). I have checked all the possible errors and have defined "column name" as searchIndex but still failed. Can you help me again?

All the best,
Weiwei Fu




发自网易邮箱大师

Brian Lee

unread,
Sep 5, 2017, 12:19:58 PM9/5/17
to Fu weiwei, gen...@soe.ucsc.edu
Hi Weiwei,

You have replaced the hgBbiDbLink with bigDataUrl, correct?

Can you send your matrix.new and category files?

barChartMatrixUrl /gbdb/Goat/trans/barChart/matrix.new
barChartSampleUrl /gbdb/Goat/trans/barChart/category

You need to be sure all the spellings and case-sensitivity on your names are exactly the same used in all places (matching what you have in your barChartBars line ...Abomasum Bladder bone brain breast...).

Brian

Fu weiwei

unread,
Sep 8, 2017, 3:00:08 PM9/8/17
to Brian Lee, gen...@soe.ucsc.edu
Dear Brain,

Considering the previous failure for creating bigBarchart, this time I created a hg38 database and used the example of UCSC. But unfortunately, no improvement has been made. The link is http://genome.ucsc.edu/goldenPath/help/barChart.html (I downloaded the example from here). When I replace the hgBbiDbLink with bigDataUrl, I don't know what is the value of the track line. According to my previous experience, my database table name is used on the 'track' definition line and should be consistent with the table definition ibarChartBed.as. If I didn't use hgBbiDbLink program to create a table (load data into table) but instead of using bigDataUrl, I can't load track in Gbrowse with the following command ( hgTrackDb -strict . hg38 trackDb_lab  ~/kent/src/hg/lib/trackDb.sql . ). And I think the bigDataUrl line has a same function as the track line. Can you try it in your Genome Browser instead of Genome Browser Hubs? Because when I delete the database table that created by hgBbiDbLink, using bigDataUrl instead, I will no longer have bigbarchart track and can't see this track in Gbrowse. 



All the best,
Weiwei Fu


发自网易邮箱大师

Brian Lee

unread,
Sep 8, 2017, 6:31:52 PM9/8/17
to Fu weiwei, gen...@soe.ucsc.edu

Hi Weiwei,

Thank you for your persistence in creating a UCSC Genome Browser mirror to work for your data.

One potential issue that might be coming up with your earlier files for goat could be the autoSql .as file you are using.

In an earlier email you were building a bigBed and requesting information about the .as to use for your goat.as, which would work fine for a normal bigBed. When building your bigBarChart, be sure to use a .as that matches the specifications on our help page, http://genome.ucsc.edu/goldenPath/help/barChart.html, under barChart format definition.

Another potential issue is that your files may have issues in them. To help, here are some example files you can grab to have in your location in the below x,y,zURL lines. But even without loading those files locally, adding this text to your hg38 trackDb should result in a working barChart on your mirror:

track tcgaGeneExprEXAMPLE2
type bigBarChart
shortLabel Cancer Gene Expr EXAMPLE
longLabel Gene Expression in 33 TCGA Cancer Tissues (GENCODE v23) EXAMPLE
barChartUnit GPM
barChartMatrixUrl http://hgwdev.cse.ucsc.edu/~brianlee/temp/remove/tcgaGeneMatrix.tab
barChartSampleUrl http://hgwdev.cse.ucsc.edu/~brianlee/temp/remove/tcgaLargeSamples.tab
barChartLabel Cancer types
barChartBars \
    Adrenocortical_carcinoma Pheochromocytoma_and_Paraganglioma Bladder_Urothelial_Carcinoma \
    Brain_Lower_Grade_Glioma Glioblastoma_multiforme Breast_invasive_carcinoma \
    Cervical_squamous_cell_carcinoma_and_endocervical_adenocarcinoma Colon_adenocarcinoma \
    Rectum_adenocarcinoma Kidney_Chromophobe Kidney_renal_clear_cell_carcinoma \
    Kidney_renal_papillary_cell_carcinoma Liver_hepatocellular_carcinoma Lung_adenocarcinoma \
    Lung_squamous_cell_carcinoma Mesothelioma Ovarian_serous_cystadenocarcinoma \
    Pancreatic_adenocarcinoma Prostate_adenocarcinoma Skin_Cutaneous_Melanoma \
    Lymphoid_Neoplasm_Diffuse_Large_B-cell_Lymphoma Stomach_adenocarcinoma \
    Testicular_Germ_Cell_Tumors Thymoma Thyroid_carcinoma Uterine_Carcinosarcoma \
    Uterine_Corpus_Endometrioid_Carcinoma Cholangiocarcinoma Esophageal_carcinoma \
    Head_and_Neck_squamous_cell_carcinoma Sarcoma Uveal_Melanoma
barChartColors \
    \#8FBC8F #8FBC8F #CDB79E #EEEE00 #EEEE00 #00CDCD #EED5D2 \
    \#CDB79E #CDB79E #CDB79E #CDB79E #CDB79E #CDB79E #9ACD32 #9ACD32 #9ACD32 \
    \#FFB6C1 #CD9B1D #D9D9D9 #1E90FF #CDB79E #FFD39B #A6A6A6 #008B45 #008B45 \
    \#EED5D2 #EED5D2 #ff0000 #ff8d00 #ffdb00 #00d619 #009fff
barChartMetric median
bigDataUrl http://hgwdev.cse.ucsc.edu/~brianlee/temp/remove/tcgaGeneExpr.bb
maxLimit 8000
labelFields name2, name
defaultLabelFields name2
visibility pack
group map

To make this even more straightforward, attached is a file called "file" that you can directly load into your mysql mirror hg38 database trackDb table with the following command:

mysql -e "load data local infile 'file' into table trackDb;" hg38

A test of this was done on a virtual machine mirror of our code called a Genome Browser in a Box (GBiB) and it worked, so in theory it should work for your mirror too.

Once you know you have a working version of barCharts on your mirror, you can begin to examine the steps in building the matrix and category files (as well as the bigBed with the correct -as file). The terms in barChartBars need to exactly match the terms in your Samples file, that is often one place were issues can show up.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genomics Institute

file
Reply all
Reply to author
Forward
0 new messages