Optimal bamCompare parameters for chicken galGal4 ChIP-Seq data

256 views
Skip to first unread message

Yung-Chih Lai

unread,
Sep 6, 2015, 12:57:02 PM9/6/15
to deepTools

Hi,

 

I use bamCompare on deepTools to test an H3K27ac ChIP-Seq with an Input as the attached fig1. The species is chicken (galGal4) with 1.2 Gb genome size.  Totally, there are 52.3 million and 27.3 million single-end 75 n.t. reads for IP and Input, respectively. First track is the peaks called by MACS, and the second red track is a bigWig file converted from a bam file without correcting by Input using the main Galaxy. The remaining tracks are all produced from the same bam files by deepTools (i.e. bamCompare) with various parameters: (1) log2, ratio, and difference to compare the ChIP and Input; (2) RPKM or not normalize for the difference method; (3) 75 or 300 length of the average fragment size. Other parameters I set are the same, including 50 bp bin size, Do not extend paired ends: No, Ignore duplicates: Yes, Minimum mapping quality: 10, and Treat missing data as zero: Yes.

 

May I have several questions as below?

 

(1) I cannot find any difference between RPKM and No Scale methods, e.g. diff_RPKM_frag300 (track 3) and diff_NoScale_frag300 (track4). In theory, they should be different. Do you know what’s wrong I did?

 

(2) Fragment 300 seems better than only 75 (i.e. without extending in my case). If the longer fragment is always better, what value is the optimal you suggest, e.g. 300?

 

(3) Compute different seems better than log2 and ratio methods in my case. Do you know why?

 

(4) In the UCSC Genome Browser, I have to adjust every track one by one, if I don’t want to show negative values. Any more convenient method I can try?

 

Best and many thanks for your help

 

Gary

fig1.png

Fidel Ramirez

unread,
Sep 16, 2015, 7:23:25 AM9/16/15
to Yung-Chih Lai, deepTools
Dear Gary,

See my answers below:

 

(1) I cannot find any difference between RPKM and No Scale methods, e.g. diff_RPKM_frag300 (track 3) and diff_NoScale_frag300 (track4). In theory, they should be different. Do you know what’s wrong I did?

I theory you should be able to see differences in the scale bot not in the signal distribution. Is the scale the same for both cases? 

 

(2) Fragment 300 seems better than only 75 (i.e. without extending in my case). If the longer fragment is always better, what value is the optimal you suggest, e.g. 300?

For me both tracks look fine. In principle the fragment length should be the size of the fragments used for library preparation. Either you use the value given by the library preparation lab or you can use the estimation that MACS does. The fragment length recommended by Illumina (as far as I recall) is close to 300 and maybe that's the length of your fragments. 

 

(3) Compute different seems better than log2 and ratio methods in my case. Do you know why?

That's because the log of the ratio is not as prominent compared to the difference. For example, a 16 fold change of signal vs. input shows will appear very prominently in the diff case but a value of 4 will appear in the log 2 case. A 32 fold change will appear almost twice as large in the diff case compared to 16 fold change, while for the log 2 the value is going to be 5. 

 

(4) In the UCSC Genome Browser, I have to adjust every track one by one, if I don’t want to show negative values. Any more convenient method I can try?

I personally use the IGV browser. It allows you to adjust more than one track at once.

Best,

Fidel

Yung-Chih Lai

unread,
Sep 16, 2015, 1:29:20 PM9/16/15
to Fidel Ramirez, deepTools

Hi Fidel,

 

Many thanks for your explanation. However, the scales I have checked are all the same for both RPKM and NoScale methods (two purple or two green tracks are the same in the attached fig1). Could you help to check my tracks as below?

 

http://genome.ucsc.edu/cgi-bin/hgSession?hgsid=444821363_Nq4CHE9xPi05iI0fya5hjgeCbOzB&hgS_doSessionDetail=deepTools

 

Based on the response from our epigenome center as below. Do you suggest I use 300 bp fragment length or more because of over cross-linked? Thanks you so much.

 

E-mail 1: Gary to Charles

Hi Charles,

I need the information of the fragment size when you sequence our ChIP-Seq data. The fragment size refers to the size of the DNA pieces that are ligated onto the flow cell. I need its range and average size to run a ChIP-Seq tool, i.e. deepTools. Many thanks.

Gary

 

E-mail 2: Charles to Gary

Your chromatin is overcrosslinked, so the size range is great, between about 300 and 2 kb.  Probably about half the reads are 350 or so

 

E-mail 3: Gary to Chalres

Hi Charles,

Many thanks for your information. Could you show me how do you know our chromatin is overcrosslinked?

Gary

 

E-mail 4: Charles to Gary

Because it is always very large.  Ideally you don't want anything bigger than 1 kb

 

Best,

 

Gary

fig1.png

Fidel Ramirez

unread,
Sep 17, 2015, 11:39:21 AM9/17/15
to Yung-Chih Lai, deepTools
Dear Gary,

I did a test and with my own data and I see a clear difference between the non-scaled and the RPKM scaled files. Can you sent me the deepTools version and commands that you used to track the problem?

Regarding the fragment length, I suggest you to look at the MACS output. Part of MACS computation is the estimation of the fragment length which is printed in the screen when it runs. Otherwise you can use a value of 350 or simply do not extend the reads and use the read length for the coverage. This is also fine to do.

I have never hear about over-crosslinking being reflected in the fragment length. I am asking our sequencing unit which has lots of experience.

Best,

Fidel


--

Fidel Ramirez

Yung-Chih Lai

unread,
Sep 17, 2015, 7:24:47 PM9/17/15
to Fidel Ramirez, deepTools

Dear Fidel,

 

I really appreciate your help very much.

 

In fact, I know little of command lines. I run deepTools on your Galaxy server (http://deeptools.ie-freiburg.mpg.de/) and many thank for adding chicken galGal4 reference genome for me. The original files I showed you have been removed from deepTools/Galaxy, because of the limited space.  Fig1 are similar examples for your reference.  The scales are the same for both RPKM and NoScale methods (fig1). The attached figures are parameters I set on deepTools/Galaxy for RPKM (RPKM_1 & RPKM_2) and NoScale (NoScale_1 & NoScale_2). Anything I did was wrong?

 

Besides over-crosslinking suggested by our epigenome center, I guess there are other issues could make ours libraries with longer fragment length for your information: (1) we don’t do size selection, (2) H3K27ac (histone modification) are relative broad enrichment than transcription factors. Of course, I am looking forward to knowing any suggestion from your sequencing center. Thank you so much. The deepTools is really a great tool helps me a lot.


Best,

 

Gary

fig1.png
RPKM_1.png
RPKM_2.png
NoScale_1.png
NoScale_2.png
Reply all
Reply to author
Forward
0 new messages