Chip sequence peak calling

31 views

Skip to first unread message

Islam,M. Rafiqul

unread,

Jul 21, 2016, 5:47:49 PM7/21/16

to gen...@soe.ucsc.edu

Dear

Over the three days, I have been trying to figure out how to analyze my ChIP seq data- reading review articles, searching on-line and UCSC web info. I am not really sure I understand any of these having no Bioinfo background. Here is the question:

I have sequence data in excel as well as bed files with: chr start end log p etc (attached file). These are also selected over the preimmune IgG.

Now I like to find the binding sites/sequences.

Can I do that on UCSC Genome Browser? What are the steps? How can download those data?

Can I do on Windows or need Unix/Linux or like environment on Windows (Cygwin)?

Sorry for so many questions. If you could direct me to the specific UCSC links or advise me if other better user friendly programs available, that will be highly appreciated.

Thanks in advance.

Rafiq islam

Biochemistry

Northwest Missouri State University

5V-4V UCSC2 - Copy.bed

Brian Lee

unread,

Jul 21, 2016, 8:00:48 PM7/21/16

to Islam,M. Rafiqul, gen...@soe.ucsc.edu

Dear Rafiq,

Thank you for using the UCSC Genome Browser and your question about ChIP-seq peak calling.

You may wish to ask questions at a bioinformatics forum like https://www.biostars.org/, it is not immediately clear at which stage of the process you may be with your data.

It may be of interest to look at other external reference sites as well, such as the ENCODE project's Transcription Factor ChIP-seq Pipeline page,https://www.encodeproject.org/chip-seq/transcription_factor/, and related software information: https://www.encodeproject.org/search/?type=Software&limit=all

If you have finalized BED regions, which appears to be what is in the files you have attached, you can use those to perform various queries on the UCSC Genome Browser.

By loading a custom track, such as the following by pasting the data on the hg19 Custom Track Page, http://genome.ucsc.edu/cgi-bin/hgCustom?db=hg19, accessible under the top blue bar "My Data" an "Custom Tracks" you can view data:

track name=uniqueNameBed1 description="chip5v-4v_1"        
chr1    4588690        4589065
chr1    28320368    28320820
chr1    28700424    28701069
chr1    31415317    31415763
chr1    31501278    31501618

Note that custom track lines should only have the word "track" once and that each new track should have a unique identifier after "name=".

With this loaded on the browser you can use tools such as the Data Integrator to pull out other significant data in the Browser. Going to the top "Tools" bar and selecting "Data Integrator" you can set the ""track group" to "Custom Tracks" and add the "track" uniqueNameBed1 by clicking the "Add" button.

If you were interested in getting Transcription Factor Binding Spots that associate to these regions next change the "track group" from "Custom Tracks" to "Regulation" and the "track" to "ENCODE Regulation -Txn Factor ChIP (wgEncodeRegTfbsClusteredV3)" and click "Add".

At this point if you click "Get output" you would get all fields from this table that intersect with your custom track. To help modify that output, first click "Choose fields" under "Output options" and then for "Txn Factor ChIP only select the first five fields: "chrom chromStart chromEnd name score" and then click "Done".

At the very top of the page, be sure you have "region to annotate" set to "genome" and then click "Get output".

This will result in information such as the following:

# hgIntegrator: database=hg19 region=genome Thu Jul 21 16:50:38 2016
#ct_uniqueNameBed1_6074.chrom    ct_uniqueNameBed1_6074.chromStart    ct_uniqueNameBed1_6074.chromEnd    wgEncodeRegTfbsClusteredV3.chrom    wgEncodeRegTfbsClusteredV3.chromStart    wgEncodeRegTfbsClusteredV3.chromEnd    wgEncodeRegTfbsClusteredV3.name    wgEncodeRegTfbsClusteredV3.score
chr1    4588690    4589065                    
chr1    28320368    28320820    chr1    28320680    28320956    MAX    186
chr1    28700424    28701069    chr1    28699794    28700504    SETDB1    249
chr1    28700424    28701069    chr1    28699816    28700679    KAP1    312
chr1    28700424    28701069    chr1    28700735    28701179    POLR2A    307
chr1    28700424    28701069    chr1    28700803    28701133    EP300    130
chr1    31415317    31415763                    
chr1    31501278    31501618    chr1    31500919    31501509    IKZF1    216
chr1    31501278    31501618    chr1    31500943    31501307    TEAD4    255
chr1    31501278    31501618    chr1    31501026    31501362    TBL1XR1    218
chr1    31501278    31501618    chr1    31501322    31501672    SIRT6    108

The first three columns are your bed regions defined by your custom track, the next coordinates are for TFBS within those regions, and the score column from wgEncodeRegTfbsClusteredV3 indicates the degree of confidence of a TFBS in that region, you can read more in this previously answered MLQ: https://groups.google.com/a/soe.ucsc.edu/forum/#!msg/genome/FPZuwGAuWoI/Hn00L0SPRUAJ

Please know too that you can search our MLQ archives for previously answered questions where you may find much valuable information:https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!search/chipseq$20peak$20

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead togenom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genomics Institute

--

Reply all

Reply to author

Forward

0 new messages