MACS "ImportError"

Hunter Richards

unread,

Jan 12, 2012, 3:04:07 PM1/12/12

to MACS announcement

Hi,

I recently upgraded to Ubuntu 11.04 and it broke MACS. I get this
error when trying to run MACS:

Traceback (most recent call last):
File "/usr/local/bin/macs14", line 35, in <module>
from MACS14.OptValidator import opt_validate
ImportError: No module named MACS14.OptValidator

---

I've read it could be a python issue and I have version 2.7.1
installed. I was under the impression that MACS won't work with
earlier versions of python.

Any ideas?

Thanks!

Ivan Gregoretti

unread,

Jan 12, 2012, 3:36:37 PM1/12/12

to macs-ann...@googlegroups.com

Both MACS run well on Python v 2.7.1.

Try going to the MACS source directory, become root and do

python install setup.py

That should re-install MACS.

Ivan

Ivan Gregoretti, PhD

> --
> You received this message because you are subscribed to the Google Groups "MACS announcement" group.
> To post to this group, send email to macs-ann...@googlegroups.com.
> To unsubscribe from this group, send email to macs-announcem...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/macs-announcement?hl=en.
>

Chinh Hoan

unread,

Jan 20, 2012, 11:52:48 AM1/20/12

to macs-ann...@googlegroups.com

Hello,
Could anyone please explain me how tags are counted in peaks.xls file? Does this show higest # of tags at summit?. Thanks
Chinh
________________________________________
From: macs-ann...@googlegroups.com [macs-ann...@googlegroups.com] on behalf of Ivan Gregoretti [ivan...@gmail.com]
Sent: Thursday, January 12, 2012 2:36 PM
To: macs-ann...@googlegroups.com
Subject: Re: [macs-announscement] MACS "ImportError"

python install setup.py

That should re-install MACS.

Ivan

Ivan Gregoretti, PhD

________________________________

UT Southwestern Medical Center
The future of medicine, today.

Ivan Gregoretti

unread,

Jan 20, 2012, 12:26:27 PM1/20/12

to macs-ann...@googlegroups.com

Hi Chinh,

Is your question related to MACS "ImportError"?

Ivan

Chinh Hoan

unread,

Jan 20, 2012, 12:30:21 PM1/20/12

to macs-ann...@googlegroups.com

Hi Ivan,
No, but now I know what is the "tags" in peaks.xls file mean. Thanks anyway

Chinh
________________________________________
From: macs-ann...@googlegroups.com [macs-ann...@googlegroups.com] on behalf of Ivan Gregoretti [ivan...@gmail.com]

Sent: Friday, January 20, 2012 11:26 AM

Gyan Prakash Srivastava

unread,

Jan 28, 2012, 2:11:49 PM1/28/12

to macs-ann...@googlegroups.com

Hello MACS experts,

I have a serious concern with the MACS output from my chip-seq data. I
have chip-seq data from postmortem brain samples (same regions) from
many individuals. The protocol is exactly same for generating data. We
expect lot of similarity from chip-seq data from these samples. When I
run MACS for each sample with same parameter and control, I see
different "d" values like below.
# d = 43
# d = 269
# d = 38
# d = 302
# d = 38
# d = 39
# d = 38
# d = 37
# d = 40
# d = 39
# d = 37
# d = 35
# d = 40

My concern is, whether "d" is very sensitive variable? I think
different "d" values could lead different shiftsize and hence
completely different peak profile (count and location). Any suggestion?

Thanks,
Gyan

The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

Batool Akhtar-Zaidi

unread,

Jan 28, 2012, 2:16:43 PM1/28/12

to macs-ann...@googlegroups.com

Hi Gyan,

Many of these d values are really too small and therefore suspect. What are your read lengths?
Also, take a look at your verbose output- did you see anything printed to stdout along the lines of, "d is too small, perhaps due to user errors, therefore d=200bp will be used" (paraphrasing here)?

Please give us a little more information and perhaps, if you have got it, some preliminary overlap numbers with called peaks between samples with d values at your extreme high and low values (try the intersectBed function of the BEDTools suite if you've got it).

-Batool

--
You received this message because you are subscribed to the Google Groups "MACS announcement" group.

To post to this group, send email to macs-announcement@googlegroups.com.
To unsubscribe from this group, send email to macs-announcement+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/macs-announcement?hl=en.

--

Batool Akhtar-Zaidi
PhD Candidate, Scacheri Lab
Depts Molecular Medicine & Genetics
Cleveland Clinic & Case Western Reserve University
Cleveland OH 44106
tel. 216-368-2636

Yuan Hao

unread,

Jan 28, 2012, 6:14:49 PM1/28/12

to macs-ann...@googlegroups.com

Hi,

Here are my two cents. It's not uncommon to see such a small 'd' from
MACS. I have ChIP-Seq data from different breast cancer cell lines,
sequenced on different machines by different parties, with from very
high to relatively low sequencing and alignment qualities years apart.
Using MACS-1.x, I always got the d values as small as yours. As the
predicted fragment size (i.e. 'd') is always smaller than 2 x read
size, MACS couldn't build up the mode, but rather using the user
specified shifting size. Through checking of the called peak qualities
and by comparing results from using other peak calling programs, I had
an impression that MACS-1.x tends to underestimates the distance. As
MACS picks the top 1000 best peaks for d estimation (if I memory is
right), my personal guess is that this sampling process might caused
some bias with regards to peak profiles in whole. I would take
considerations of my actually fragment size, refer to at least one of
another peak calling program, and check out the peaks called in a
browser to decide on a proper shifting size that could be supplied to
MACS. This is just my personal experience without application to MACS
performance in general. Also, I haven't tried out MACS2 yet, so no
experience in the latest version. Hope this helps.

Cheers,
Yuan

Anshul Kundaje

unread,

Jan 28, 2012, 7:33:16 PM1/28/12

to macs-ann...@googlegroups.com

This mainly happens due to a mappability phenomenon that gets amplified in the following situations (individually or a combination of the situations below).

- Your dataset is a good ChIP-seq dataset but has few binding sites and hence peaks ( relative to the overall size of the genome. For human/mouse ~ < 1000 peaks)

- Your dataset is undersequenced (insufficient sequencing depth)

- Your dataset has quite a few mismapping due to lower read quality

- Your dataset has poor ChIP efficiency (hence lots of background noise and weak peaks)

- Your dataset has broad regions of enrichment and not strong punctate peaks

A useful way of estimating fragment length (different from how MACS does it) is to compute a strand cross-correlation profile of read start density on the + and - strand i.e. you compute the number of read starts at each position on the + strand and separately on the - strand for each chromosome. Then simply shift these vectors wrt each other and compute the correlation for each shift. You can then plot a cross-correlation profile as the cross-correlation values on the y-axis and the shift that you used to compute the correlation on the x-axis. This is the cross-correlation profile for the dataset. Due to the 'shift' phenomenon of reads on the + and - strand around true binding sites, one would get a peak in the cross-correlation profile at the predominant fragment length.

For a really strong ChIP-seq dataset such as say CTCF in human cells (great antibody and 45-60K peaks typically), the cross-correlation profile looks like what u see in the attached Figure CTCF.pdf. Notice the RED vertical line which is the dominant peak at the true peak shift. Also notice the little bump (the blue vertical line). This is at read-length.

At the other extreme, lets take a control dataset (input DNA). The cross-correlation profile is shown in CONTROL.pdf. Now notice how the strongest peak is the blue line (read length) and there is basically almost no other significant peak in the profile. The absence of a peak shud be expected since unlike a ChIP-seq dataset for input DNA one expects no significant clustering of fragments around specific target sites (except potentially weak biases in open chromatin regions depending on the protocol used). Now the read-length peak occurs due to unique mappability properties of the mapped reads. If a position 'i' on the + strand in the genome is uniquely mappable (i.e. a read starting at 'i' on the + strand maps uniquely), it implies that the position 'i+readlength-1' is also uniquely mappable on the - strand (ie. a read starting at i+readlength-1 on the - strand maps uniquely to that position). So in the input dataset or in random scattering of reads to uniquely mappable locations (in a genome made up of unmappable, multimappable locations and unique mappable locations), there is a greater odds of finding reads starting on the + and - strand separated by read-length than any other shift. Which is why the cross-correlation profile peaks at read-length compared to other values of strand-shift and the cross-correlation at the true fragment length/peak-shift is washed away since there are is no significant +/- strand read density shift in the input dataset.

Now take a look at what you get for some a ChIP-seq dataset that is an inbetween case.

POL2B.pdf : has few peaks (just about 3000 detectable ones in the human genome), this particular antibody is not very efficient (there are other POL2 antibodies that are very effective) and these are broad scattered peaks (following elongation patterns of POL2). Notice how you now have 2 peaks in the cross-correlation profile. One at the true peak shift (~185-200 bp) thats the one marked in red and the other at read length (the one marked in blue). For such weaker datasets, the read-length peak starts to dominate. Depending on the data quality characteristics of the dataset, the read-length peak scales relative to the true fragment length peak.

So long story short, MACS effectively tends to just pick up just the strongest peak in the cross-correlation profile (although it uses a different method of estimating the peak-shift) and for datasets that have the properties listed at the top of this email, basically it picks up the read length. For strong datasets, it picks up the true shift. What one needs to do is find the peak in the cross-correlation profile ignoring any peak at read-length (which may be stronger or weaker than the other peaks in the profile). This always gives reliable estimates of fragment length (d/peak-shift). We have confirmed this using paired-end sequencing on a variety of different TFs and histone marks with different binding characteristics and ubiqiuity (where you can actually observe the distribution of fragment lengths for comparison). We have seen this phenomenon in a large number of datasets (ENCODE and modENCODE datasets). We have a paper in press right now that deals with this phenomenon as well as how it can be used as a useful data quality measure. Once it is published I can send a link to those interested.

If you would like to have some code that computes the fragment length based on the cross-correlation method shoot me an email. I am hesitant to link it here without Tao's permission, since it uses the code-base from another peak caller. You can then use the --shift-size parameter set to 1/2 the estimated fragment length with --no-model. You will notice significantly better results with a correctly estimated 'd'.

I think at some point, it might be useful to have this cross-correlation method incorporated within MACS so as to make the d estimation more robust (which is probably one of the only unstable aspects of an otherwise fantastic peak caller).

Thanks,

Anshul.

To post to this group, send email to macs-announcement@googlegroups.com.
To unsubscribe from this group, send email to macs-announcement+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/macs-announcement?hl=en.

--
You received this message because you are subscribed to the Google Groups "MACS announcement" group.

To post to this group, send email to macs-announcement@googlegroups.com.
To unsubscribe from this group, send email to macs-announcement+unsubscribe@googlegroups.com.

CONTROL.pdf

CTCF.pdf

POL2B.pdf

Tao Liu

unread,

Jan 29, 2012, 1:16:26 AM1/29/12

to macs-ann...@googlegroups.com

Hi Anshul,

On Jan 28, 2012, at 7:33 PM, Anshul Kundaje wrote:

> If you would like to have some code that computes the fragment length based on the cross-correlation method shoot me an email. I am hesitant to link it here without Tao's permission, since it uses the code-base from another peak caller. You can then use the --shift-size parameter set to 1/2 the estimated fragment length with --no-model. You will notice significantly better results with a correctly estimated 'd'.

You don't need my permission :)

> I think at some point, it might be useful to have this cross-correlation method incorporated within MACS so as to make the d estimation more robust (which is probably one of the only unstable aspects of an otherwise fantastic peak caller).

Good suggestion! I would explore it when I get chance.

Best,
Tao Liu

Research Fellow
Dept of Biostats and Comp Bio, DFCI / HSPH
450 Brookline Ave., Boston, MA 02215

Anshul Kundaje

unread,

Jan 29, 2012, 2:59:18 AM1/29/12

to macs-ann...@googlegroups.com

Just didn't feel right to post about another peak caller in the MACS group. But now that I have your blessing :-)

Here's the package.

http://www.ebi.ac.uk/~anshul/public/softwareRepo/spp_package.tar.gz

It is simply a wrapper around Peter Kharchenko's SPP peak caller http://compbio.med.harvard.edu/Supplements/ChIP-seq/ (although the base package is modified a bit so I would recommend to use the one included in the tar.gz file that I linked to above). There is a README that explains usage. Note that the package linked above works only on pure unix distributions. If you need a package that would work on a mac shoot me an email as some parts need to be modified to install the package successfully.

Thanks,

Anshul.

--
You received this message because you are subscribed to the Google Groups "MACS announcement" group.

To post to this group, send email to macs-ann...@googlegroups.com.
To unsubscribe from this group, send email to macs-announcem...@googlegroups.com.

Yuan Hao

unread,

Jan 29, 2012, 6:46:13 AM1/29/12

to macs-ann...@googlegroups.com

Great stuff! Anshul, if you don't mind would you pass the package for
using on a mac (Mac Pro OS 10.5.8) for me? Thank you very much in
advance!

Yuan

Quang Trinh

unread,

Jan 30, 2012, 3:51:29 PM1/30/12

to macs-ann...@googlegroups.com

Hello,
I am trying to use macs14 on bam files produced by bwa but get this
error:

>macs14 -t testChip,testInput -c input.bam -t ChIP.bam -f BAM -g ce -n
>ceh-14 -w --call-subpeaks
...
...
INFO @ Mon, 30 Jan 2012 14:45:32: #3 find negative peaks by swapping
treat and control
INFO @ Mon, 30 Jan 2012 14:45:33: #3 Finally, 70 peaks are called!
INFO @ Mon, 30 Jan 2012 14:45:33: #4 Write output xls file...
ceh-14_peaks.xls
INFO @ Mon, 30 Jan 2012 14:45:33: #4 Write peak bed file...
ceh-14_peaks.bed
INFO @ Mon, 30 Jan 2012 14:45:33: #4 Write summits bed file...
ceh-14_summits.bed
INFO @ Mon, 30 Jan 2012 14:45:33: #4 Write output xls file for negative
peaks... ceh-14_negative_peaks.xls
INFO @ Mon, 30 Jan 2012 14:45:33: #5 Done! Check the output files!
INFO @ Mon, 30 Jan 2012 14:45:33: #6 Try to invoke PeakSplitter...
INFO @ Mon, 30 Jan 2012 14:45:33: #6 Please check
ceh-14_peaks.subpeaks.bed file for PeakSplitter output!
error occurred: wig file for choromosome chrI is missing
Make sure that the file name contains the string "chrI" before the
chromosome number

Can some tell me how to get around this error message?

Thanks,

Q
--
Quang M. Trinh, Ph.D.
Scientist

Ontario Institute for Cancer Research
http://www.oicr.on.ca

Telephone: 1 416 673 8576

This message and any attachments may contain confidential and/or
privileged information for the sole use of the intended recipient.
Any review or distribution by anyone other than the person for whom
it was originally intended is strictly prohibited. If you have received
this message in error, please contact the sender and delete all copies.
Opinions, conclusions or other information contained in this message
may not be that of the organization.

Reply all

Reply to author

Forward