using MACS to analyze DNaseI Hypersensitivity data

495 views
Skip to first unread message

Batool Akhtar-Zaidi

unread,
Jan 4, 2012, 10:34:43 AM1/4/12
to macs-ann...@googlegroups.com
Hello MACS forum members,

Does anybody know whether it is possible to use MACS to analyze DNaseI Hypersensitivity data?

I understand that MACS generally has got a strict dependence on building a model based on bimodal read pile-up to infer binding sites for ChIP-seq and make peak calls.

DNaseI HS data, however, effectively has a fragment size of zero, and therefor no such bimodal pile-up of Watson and Crick tags.

If anybody has been able to tweak MACS parameters to successfully handle DNaseI-seq data, please do share your thoughts (and command line).

Best regards,

Batool

--

Batool Akhtar-Zaidi
PhD Candidate, Scacheri Lab
Depts Molecular Medicine & Genetics
Cleveland Clinic & Case Western Reserve University
Cleveland OH 44106
tel. 216-368-2636


Hung-Chung Huang

unread,
Jan 4, 2012, 12:41:29 PM1/4/12
to macs-ann...@googlegroups.com
Hi Batools,

You might disable the modeling building and set the fragment size to 0 by following command arguments to see what happens.
--nomodel True --shiftsize=0

Zoello
--



--
You received this message because you are subscribed to the Google Groups "MACS announcement" group.
To post to this group, send email to macs-ann...@googlegroups.com.
To unsubscribe from this group, send email to macs-announcem...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/macs-announcement?hl=en.

Ivan Gregoretti

unread,
Jan 4, 2012, 12:57:07 PM1/4/12
to macs-ann...@googlegroups.com
Hello Batool,

Have you looked at any successful DNaseI hypersensitivity experiment?

Do you see well defined peaks that are absent in input DNA?

If the peaks are distinct to the eye, you should be able use MACS
overriding model-building. You will have to estimate the typical peak
size visually from the aligned reads.

Ivan


Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health

Pedro Madrigal

unread,
Jan 5, 2012, 3:49:05 AM1/5/12
to macs-ann...@googlegroups.com
Dear all,

And if the peaks are not well defined and also present in input data?
Any tool exclusively dedicated to the analysis of DNase-seq? To my
knowledge, only F-Seq states the option to analyze such type of data.

Also, it seems that in DNase-seq is more important searching "cleavages"
or "footprints" rather than searching peaks as in ChIP-Seq. Anyone with
experience on that?

Best,
Pedro

>> Cleveland Clinic& Case Western Reserve University

Tao Liu

unread,
Jan 5, 2012, 11:04:58 AM1/5/12
to macs-ann...@googlegroups.com
Hi Zoello and Batool,

Any extension size is better than no extension at all. Even if there is no meaningful fragment size, signal pileup and data smoothing are still essential for peak detection algorithm. If you consider every tag only represents 0bp fragment, how can you decide where the enrichment is?

As for DNAseI hypersensitive studies, there are two scenarios. Using human ENCODE data as example, data from Duke university and University of Washington are generated by two different protocols. The key difference is the depth of digestion. DNA fragment captured is smaller in UW library comparing to Duke library due to different levels of digestion. In UW library, deep digestion and a gel cut for more smaller fragments can make sure the sequencing ends enriched at the boundaries of regions where the DNA is less accessible by the enzyme. These regions are more likely protected from DNaseI because of protein (TF, histone or other chromatin factors) binding in nuclei. So the following analysis can be considered similar to ChIP-seq where sonication tends to attack the boundaries of TF binding sites. In this case, tag extension towards 3' direction with hundreds of basepairs either from MACS prediction or an arbitrary setting, would work perfect. You can simply apply MACS on this kind of data. At the end, you will more likely predict where the DNA footprints are. However the sequencing tags from Duke are ends of bigger DNA fragments. So tag extension towards 3' direction may less likely reach where the real footprint is. In this case, the aim of the study should be to look for where the DNAseI hypersensitive sites are instead of to find footprints. My opinion is to extend every tag towards both 5' and 3' directions then pile them up, therefore at the end, the regions with more pileup would be more vulnerable to DNAseI digestion. If you want to use MACS for this purpose, you may need to manipulate the raw data then turn off model building in MACS.

That's my point of view. If anyone has difference opinion, please let us know.

Best,
Tao Liu

Research Fellow
Dept of Biostats and Comp Bio, DFCI / HSPH
450 Brookline Ave., Boston, MA 02215

ara...@bu.edu

unread,
Aug 20, 2013, 11:36:54 AM8/20/13
to macs-ann...@googlegroups.com
Hello all,

I wanted to re-post this question to get a current consensus on whether MACS2 is (or still is) appropriate for DHS peak discovery.  I'm working with mouse liver DNase-Seq data where we aim for fragment sizes around 300 bp.  At the moment I have been comparing DHS peaks between MACS2 and Hotspots (ENCODE).  For a typical sample from our lab MACS2 discovers 40K peaks (20% of reads in DHS peaks) while Hotspots discovers 150K peaks (30% of reads in DHS peaks) with virtually 100% of the MACS2 peaks in common with Hotspot regions and about 65% of the Hotspot regions in common with the MACS2 peaks.

Analyzing ENCODE data with MACS2, I get 100K DHS peaks (50% of reads in DHS peaks).  ENCODE data with Hotspots produces 140K DHS peaks (60% of reads in DHS peaks).

The goal of my current analysis is to be able to identify differential DHS sites between different sets of samples.  I've previously justified using MACS with DHS data knowing that for DHS peaks with 500 bp width we see a bimodal distribution of positive and negative strand reads (DHS_Peak_Example.pdf).  Given the varied nature of DHS sites ranging from narrow to broad this bimodal distribution is not always present. 

Question(s): Is MACS2 appropriate for DHS peak discovery (what options should I use for this purpose)?

Thanks,
Andy Rampersaud
Graduate Student, Bioinformatics
Waxman Lab, Boston University
DHS_Peak_Example.pdf

Tao Liu

unread,
Aug 20, 2013, 2:29:25 PM8/20/13
to macs-ann...@googlegroups.com
Hi Andy,

Of course you can use MACS2 on DNAse-seq data analysis. As for Hotspot, it uses a 250bps sliding window for calculation tag enrichment which is equivalent to a fixed 250bps extension in MACS2 setting.  As for number of peaks, it mainly depends on cutoff. Although DHS sites range from narrow to broad, there may still be an intrinsic fragment size in the library -- check your _model.pdf file from MACS2 or try Anshul's PhantomPeak tool. After all, this 'fragment size' is a factor for smoothing method, and an approximately correct smoothing is enough to improve peak detection. You may also try other methods without data smoothing, for example, SPP from Peter Park's lab or GPS from David Gifford's lab which can detect regions mainly based on forward and reverse reads balancing.

Best,
Tao Liu

--
Assistant Professor
Department of Biochemistry
University at Buffalo
NY State Center of Excellence in Bioinformatics & Life Sciences

B2-163 COEBLS
(O) 716-829-2749
tl...@buffalo.edu

Mailing address:
University at Buffalo-COEBLS
701 Ellicott St, B2-163
Buffalo, NY 14203-1221

--
You received this message because you are subscribed to the Google Groups "MACS announcement" group.
To unsubscribe from this group and stop receiving emails from it, send an email to macs-announcem...@googlegroups.com.

To post to this group, send email to macs-ann...@googlegroups.com.
Visit this group at http://groups.google.com/group/macs-announcement.
For more options, visit https://groups.google.com/groups/opt_out.
<DHS_Peak_Example.pdf>

Reply all
Reply to author
Forward
0 new messages