How to use QuEST for ATAC-Seq data

210 views
Skip to first unread message

Anton

unread,
Jun 3, 2015, 7:56:33 PM6/3/15
to chi...@googlegroups.com
Hi,

we will be adding ATAC-Seq analysis options to QuEST.

However, you can already analyze your ATAC-Seq data with QuEST, and this thread is to explain how to do this.

1. Align ATAC-Seq data using bwa, novoalign or similar program. Use sam files as the output or convert to sam files using other programs such as samtools. There is no need for control data. If you are doing differential enhancer strength analysis, process two ATAC-Seq datasets independently using QuEST and analyze differences later (will post how).

2. Run QuEST configuration script: 

~/progs/QuEST_2.4/generate_QuEST_parameters.pl -sam_align_ChIP ATAC-1_S5_R1_R2.sam -gt /Volumes/Virgo/Genomes/Mouse/mm9/genome_table -ap ./QuEST_results -ChIP_name ATAC-Seq_data


Since ATAC-Seq data tends to have a lot of chrM alignments, I recommend to remove chrM entry from the genome table. That way you will skip all chrM mappings and should get slightly better genome-wide enrichment.

Press any button when prompted (you are given a warning since no control data is provided).

Select option '4' for the type of experiment:

    ----------- QuEST parameter configuration -------------


What kind of ChIP experiment is this?


1. Transcription factor with defined motif and narrow (punctate) binding site

   resulting in regions of enrichment 100-300 bp wide.

   (bandwidth = 30 bp, region_size = 300 bp)


2. PolII-like factor resulting in regions of enrichment 300-1000 bp with

   some narrow binding sites and some wide sites possibly occupying

   the entire gene length.

   (bandwidth = 60 bp, region_size = 600 bp)


3. Histone-type ChIP resulting in wide regions of enrichment

   1Kb and up possibly occupying multiple genes.

   (bandwidth = 100 bp, region_size = 1000 bp)


4. Neither, I would like to configure individual parametrs myself.


Please specify your selection [1-4]: 4


When asked to specify the value of KDE bandwidth, type 30 and for region size 300:


Please specify the value of KDE bandwidth (recommended 30-100): 30

Please specify the size of the region (recommended 300-1000 ): 300


When prompted about threshold parameter choose option '4'. Set ChIP enrichment to 3 and ChIP extension to 1 (to pick up as many peaks as possible):

Choose one of the following options:


1. Stringent peak calling parameters.

   ChIP enrichment                 = 50

   ChIP to background enrichment   = NA

   ChIP extension enrichment       = 3


2. Recommended peak calling parameters.

   ChIP enrichment                 = 30

   ChIP to background enrichment   = NA

   ChIP extension enrichment       = 3


3. Relaxed peak calling parameters.

   ChIP enrichment                 = 10

   ChIP to background enrichment   = NA

   ChIP extension enrichment       = 3


4. Neither, I want to specify peak calling parameters myself.


Please specify your selection [1-4]: 4

Please specifty ChIP enrichment (1 or greater, recommended 30): 3

Please specify the ChIP extension enrichment (1 or greater, recommended 3): 1


You can proceed to running QuEST by answering 'y' to the next question:

Would you like to run QuEST analysis now? (y/n): y

With the default QuEST parameters the wig tracks will be generated for positions where the enrichment exceeds 10. This is not always ideal if you also want to see the noise signals on your wigs or generate figures for manuscripts. I typically change the threshold to 1 to obtain better looking tracks, but the resulting files can be huge and can only be displayed using UCSC Genome Browser one chromosome per track. If you want to change your default wig threshold, answer 'n' to the previous question:

Would you like to run QuEST analysis now? (y/n): n

This will end the configuration script. Now you can change the wig threshold:

emacs ./QuEST_results/parameters/ChIP_normalized.wig.batch.pars 


In the file change the profile threshold value from 10 to 1, save and quit. 


profile_threshold = 1


You can also change it to 0 to display signals at every base.

Now launch QuEST analysis:

~/progs/QuEST_2.4_dev/run_QuEST_with_param_file.pl -ap ./QuEST_results/


In some cases peak shift may not detect automatically. Use 50 in this case to proceed, I will make a separate post how to accurately estimate peak shift using an alternative method.

Note: QuEST may take longer to run with lower thresholds, but no more than a few hours.

QuEST will report ATAC-Seq peaks and regions in QuEST_results/calls/peak_caller.ChIP.out.accepted

To obtain peak coordinates: 

cat ./QuEST_results/calls/peak_caller.ChIP.out.accepted | grep "P-" | awk '{print $2"\t"$3"\t"$5}' > ATAC-Seq.peaks.txt


The resulting tab-delimited file will contain: chrom, pos, peak score.

To obtain region coordinates:

cat ./QuEST_results/calls/peak_caller.ChIP.out.accepted | grep "R-" | awk '{print $2"-"$3"-"$5}'| awk 'BEGIN{FS="-"}{print $1"\t"$2"\t"$3"\t"$4}' > ATAC-Seq.regions.bed


The resulting tab-delimited bed file will contain chrom, begin, end, peak score.

To re-calculate the peak shift, run the following command:

cat ./QuEST_results/calls/peak_caller.ChIP.out.accepted | grep "P-" | awk '{print $13}' | head -1000 > ps.txt


Now open R and type:

w<-read.table(file="ps.txt"); median(w$V1)


You can re-run QuEST again using this recalculated peak shift.

Best, Anton






Message has been deleted

Liang

unread,
Jun 6, 2015, 2:50:26 AM6/6/15
to chi...@googlegroups.com

Hi Anton,


Thank you. I completed one run. But when I want to obtain peak coordinates following your instruction, it shows error on R. 


Error in read.table(file = "ps.txt") : no lines available in input


I list my actions below, please correct me if I have done anything wrong, or I can directly re-run QuEST again. Thank you very much.


(Terminal)


liangabai:QuEST_ATAC-G5_Anton apple$ cat ./QuEST_results/calls/peak_caller.ChIP.out.accepted | grep "P-" | awk '{print $2"\t"$3"\t"$5}' > ATAC-Seq.peaks.txt

liangabai:QuEST_ATAC-G5_Anton apple$ cat ./QuEST_results/calls/peak_caller.ChIP.out.accepted | grep "R-" | awk '{print $2"-"$3"-"$5}'| awk 'BEGIN{FS="-"}{print $1"\t"$2"\t"$3"\t"$4}' > ATAC-Seq.regions.bed

liangabai:QuEST_ATAC-G5_Anton apple$ cat ./QuEST_results/calls/peak_caller.ChIP.out.accepted | grep "P-" | awk '{print $13}' | head -1000 > ps.txt

liangabai:QuEST_ATAC-G5_Anton apple$ R


R version 3.1.3 (2015-03-09) -- "Smooth Sidewalk"

Copyright (C) 2015 The R Foundation for Statistical Computing

Platform: x86_64-apple-darwin13.4.0 (64-bit)


R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type 'license()' or 'licence()' for distribution details.


  Natural language support but running in an English locale


R is a collaborative project with many contributors.

Type 'contributors()' for more information and

'citation()' on how to cite R or R packages in publications.


Type 'demo()' for some demos, 'help()' for on-line help, or

'help.start()' for an HTML browser interface to help.

Type 'q()' to quit R.


> w<-read.table(file="ps.txt"); median(w$V1)

Error in read.table(file = "ps.txt") : no lines available in input

>


Best,


Liang






=========================

P.S. FYI


For a 4.25 GB ATAC Sam file, it ran for 30 hours. 

My system: MacBook Pro/2.2 GHz Intel Core i7/16 GB 1600 MHz DDR3

Parameters are listed below.


Type of experiment = 1

ChIP enrichment = 3

ChIP extension enrichment = 1

profile_threshold = 1


=========================

(*my note when changing the wig threshold)


Would you like to run QuEST analysis now? (y/n): n


Anton: This will end the configuration script. Now you can change the wig threshold:


(*directly key in the following)


emacs ./QuEST_results/parameters/ChIP_normalized.wig.batch.pars 


(*screen shows:)


Would you like to run QuEST analysis now? (y/n): n

guest-wireless-hsv-nat-207-151-005:QuEST_ATAC-G5_Anton apple$ emacs ./QuEST_results/parameters/ChIP_normalized.wig.batch.pars


(*below is what you’ll see, directly change the profile_threshold to 1)


genome_table = /Users/apple/Desktop/QuEST_ATAC-G5_Anton/QuEST_results/parameters/genome_table

profile_path = /Users/apple/Desktop/QuEST_ATAC-G5_Anton/QuEST_results/scores/ChIP

output_file = /Users/apple/Desktop/QuEST_ATAC-G5_Anton/QuEST_results/tracks/wig_profiles/ATAC-Seq_data_normalized\

.profile.wig

output_by_chr_path = /Users/apple/Desktop/QuEST_ATAC-G5_Anton/QuEST_results/tracks/wig_profiles/by_chr/ChIP_norma\

lized

profile_threshold = 1

normalizer = 0.0175634198412327

track_name = ATAC-Seq_data_normalized

track_priority = 93

gz = yes


(*I don’t know how to save it but it’s auto-saved, quit Terminal after saving)


(*launch Terminal again)


Anton: Now launch QuEST analysis 


(*I highlight parts in red which are vary in different cases)


~/Desktop/QuEST_2.4/run_QuEST_with_param_file.pl -ap ./QuEST_results/


(*QuEST starts to run)


(*and then it stops and asks you about peak shift)


Anton: In some cases peak shift may not detect automatically. Use 50 in this case to proceed.


QuEST failed to identify regions to determine peak shift estimate.

Do you want to override the value of peak shift? (y/n): y

Please enter the value of peak shift (recommended 50): 50


(*QuEST starts to run till the end)


Anton

unread,
Jun 8, 2015, 1:19:34 PM6/8/15
to chi...@googlegroups.com, yclian...@gmail.com


Thank you. I completed one run. But when I want to obtain peak coordinates following your instruction, it shows error on R. 


Error in read.table(file = "ps.txt") : no lines available in input


Looks like you were not able to extract any peaks.
 
Try this to see how many peaks and regions you have and paste your results here:

cat ./QuEST_results/calls/peak_caller.ChIP.out.accepted | grep "P-" | wc -l

cat ./QuEST_results/calls/peak_caller.ChIP.out.accepted | grep "R-" | wc -l



Liang

unread,
Jun 8, 2015, 2:23:27 PM6/8/15
to chi...@googlegroups.com, yclian...@gmail.com

Hi Anton,


Results:


cat ./QuEST_results/calls/peak_caller.ChIP.out.accepted | grep "P-" | wc -l

       0


cat ./QuEST_results/calls/peak_caller.ChIP.out.accepted | grep "R-" | wc -l

       0

Anton

unread,
Jun 9, 2015, 5:17:01 PM6/9/15
to chi...@googlegroups.com, yclian...@gmail.com
Hi Liang,

when you see the peaks, is it by uploading wig files from QuEST?

Try repeating analysis using only main chromosomes.

Anton
Reply all
Reply to author
Forward
0 new messages