Domain for CenH3 peak

Michael Seidl

unread,

Dec 31, 2015, 11:09:11 AM12/31/15

to RSEG Users

Hi,

I have a question that keeps me puzzled for a while. I have two sets of CenH3 (centromeric H3) pulldowns (a WT and KO strain and two biological replicates). I want to determine the CenH3 domain for each of my four samples. When I map the reads and visualize them, I get a single clear peak per chromosome. Both replicates for each of the sample are very similar. When I run RSEG on default parameters, for one of the samples (for both replicates independently) I retrieve a single domain per chromosome that perfectly overlaps my peak. However, when I run the second sample (same settings), hundreds of domains are called ( basically the entire chromosome), except the peak. It virtually looks like the mirror image of what I expect and what RSEG reported for the other sample. The samples are identical, however the second sample has approx 3x as many reads. Do you maybe have an idea what might be going on?

Kind regards

Michael

Song, Qiang

unread,

Jan 2, 2016, 2:27:32 AM1/2/16

to RSEG Users

Hi Michael,

Can you be more specific how you analyzed your data with RSEG?

Did you run rseg with the WT replicate one and replicate two respectively?

Or did you run rseg-diff mode 2 for WT with the KO as control?

Thanks,

Qiang

--
You received this message because you are subscribed to the Google Groups "RSEG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rseg-support...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Seidl

unread,

Jan 2, 2016, 6:00:10 AM1/2/16

to RSEG Users

Dear Qiang

sorry for my lack in specificity. I ran each sample and replicate independently. So, for example,

sort -k1,1 -k2,2n -k3,3n -k6,6 -o A.sort.bed A.bed

~/data/software/rseg-rseg-0.4.9/bin/rseg -c Genome.size -o A.domain.bed -i 20 -v A.sort.bed

When I compare the output of the domains for the samples where the results match my expectation, I basically get more or less a single domain per chromosome located at the same position for each of the two replicates. For the other sample, I get again a very similar result per replicate, however no domain at the expected peak. Maybe the attached figure clarifies my problem. The upper four lanes show the mapped reads and the domain called by RSEG for sample A (2 replaces). The lower for lane show the results for sample B (2 replicates). As you can see, the domains are called along the entire chromosome, with the exception of the peak...

Thanks

Michael

Song, Qiang

unread,

Jan 3, 2016, 2:16:55 AM1/3/16

to RSEG Users

Michael,

The bottom two samples are the two KO replicates, right?

Compared with the top two sample, there indeed seems some peaks out of the most prominent center peak, which may be called as enriched domains by RSEG.

I am not exactly sure quite why the center peak is missed.

What is running log that RSEG printed for the bottom two samples?

Song Qiang

--

Michael Seidl

unread,

Jan 3, 2016, 4:04:35 AM1/3/16

to RSEG Users

Dear Qiang,

indeed, the bottom two are the KO samples. I do agree with you that it has a couple of additional domains, but surely not the entire chromosomes (as can be seen by the grey blocks that are taken 1 to 1 from the coordinates from the domain-bed file).

This is the log of one of the replicates in the lower panel

[PROCESSING] B.sort.bed

[LOADING_DATA] chromosomes

[LOADING_DATA] reads

[LOADING_DATA] deadzones

[SELECTING BIN SIZE] bin size = 250

[ESTIMATIN PARAMETERS]

DELTA (PARAMS,MIX)

1 nbd 67.68 0.1419 0.414 nbd 39.9 0.2243 0.586

0.02075 nbd 64.27 0.1658 0.4139 nbd 42.3 0.2559 0.5861

0.004018 nbd 61.74 0.1728 0.4185 nbd 43.95 0.2804 0.5815

ITR F size B size F PARAMS B PARAMS DELTA

1 131.4 89.91 nbd 61.7 0.1311 nbd 36.35 0.3216 1

2 159.6 66.23 nbd 59.32 0.127 nbd 32.35 0.4274 0.01038

3 229.5 56.16 nbd 55.41 0.09777 nbd 35.06 1.003 0.008281

4 285.2 38.05 nbd 51.89 0.08352 nbd 47.73 2.082 0.02795

5 226.3 26.05 nbd 50.56 0.0798 nbd 58.59 2.413 0.01462

6 190.4 22.31 nbd 50.23 0.07783 nbd 61.27 2.369 0.001559

7 173.1 20.88 nbd 50.11 0.07649 nbd 61.99 2.306 0.0001609

8 164 20.18 nbd 50.05 0.07564 nbd 62.29 2.263 4.8e-05

9 158.7 19.79 nbd 50.01 0.0751 nbd 62.45 2.236 1.81e-05

10 155.5 19.56 nbd 49.99 0.07476 nbd 62.54 2.218 7.114e-06

11 153.6 19.42 nbd 49.97 0.07455 nbd 62.6 2.207 2.847e-06

12 152.3 19.33 nbd 49.96 0.07441 nbd 62.64 2.2 1.154e-06

13 151.6 19.27 nbd 49.96 0.07433 nbd 62.67 2.196 4.716e-07

14 151.1 19.23 nbd 49.95 0.07427 nbd 62.68 2.193 1.94e-07

15 150.7 19.21 nbd 49.95 0.07423 nbd 62.69 2.191 8.01e-08

16 150.5 19.2 nbd 49.95 0.07421 nbd 62.7 2.19 3.316e-08

17 150.4 19.19 nbd 49.95 0.07419 nbd 62.7 2.189 1.376e-08

18 150.3 19.18 nbd 49.95 0.07419 nbd 62.71 2.189 5.712e-09

19 150.3 19.18 nbd 49.94 0.07418 nbd 62.71 2.188 2.374e-09

20 150.2 19.17 nbd 49.94 0.07417 nbd 62.71 2.188 9.871e-10

FINAL ESTIMATES

---------------

Emission distributions

State 0: nbd 49.94 0.07417

State 1: nbd 62.71 2.188

Expected sizes

State 0: 150.2

State 1: 19.17

Transition probabilities

0.9933 0.006657

0.05215 0.9478

Domains file: B.domain.bed

This is one of the samples where the peak is as expected...

[PROCESSING] A.sort.bed

[LOADING_DATA] chromosomes

[LOADING_DATA] reads

[LOADING_DATA] deadzones

[SELECTING BIN SIZE] bin size = 400

[ESTIMATIN PARAMETERS]

DELTA (PARAMS,MIX)

1 nbd 105.7 0.0908 0.003584 nbd 0.219 4.284 0.9964

0.5477 nbd 90.34 0.2679 0.004492 nbd 0.1922 2.583 0.9955

0.01794 nbd 78.89 0.4966 0.005275 nbd 0.1819 1.836 0.9947

ITR F size B size F PARAMS B PARAMS DELTA

1 21.58 3471 nbd 68.22 0.8319 nbd 0.1768 1.506 1

2 21.12 3045 nbd 61.5 1.11 nbd 0.1746 1.39 0.003198

3 22.09 3017 nbd 58.41 1.259 nbd 0.1738 1.358 0.000771

4 22.34 2997 nbd 57.43 1.31 nbd 0.1735 1.351 0.0001203

5 22.39 2988 nbd 57.13 1.326 nbd 0.1735 1.349 1.16e-05

6 22.4 2984 nbd 57.04 1.33 nbd 0.1734 1.348 1.021e-06

7 22.4 2983 nbd 57.01 1.332 nbd 0.1734 1.348 8.596e-08

8 22.4 2983 nbd 57.01 1.332 nbd 0.1734 1.348 7.146e-09

9 22.4 2983 nbd 57.01 1.332 nbd 0.1734 1.348 5.921e-10

10 22.4 2983 nbd 57 1.332 nbd 0.1734 1.348 4.883e-11

11 22.4 2983 nbd 57 1.332 nbd 0.1734 1.348 3.944e-12

12 22.4 2983 nbd 57 1.332 nbd 0.1734 1.348 3.955e-13

CONVERGED

FINAL ESTIMATES

---------------

Emission distributions

State 0: nbd 57 1.332

State 1: nbd 0.1734 1.348

Expected sizes

State 0: 22.4

State 1: 2983

Transition probabilities

0.9554 0.04464

0.0003352 0.9997

Domains file: A.domain.bed

Thanks again!

Michael

Song, Qiang

unread,

Jan 4, 2016, 1:12:52 AM1/4/16

to RSEG Users

Michael,

Thanks for providing the log file.

With sample B, the enriched regions are indeed inverted.

In the final estimates, the state 0 indicates the foreground regions (i.e., enriched regions), however with sample B,

its mean coverage (49.94) for the foreground state is lower than that (62.71) for the background.

The RSEG is initialized with higher coverage in the foreground, however during the training it diverge from the initialization, possibly due to the inaccurate initialization of the foreground domain size (foreground is initialized to 131.4 bins versus background 89.91 bins).

We may consider to force the foreground to have higher coverage in future updates to RSEG.

For now, I think you can just invert the domains from sample B to get truly enriched regions.

---------------

Emission distributions

State 0: nbd 49.94 0.07417

State 1: nbd 62.71 2.188

Expected sizes

State 0: 150.2

State 1: 19.17

This is an interesting behavior of RSEG we have not encountered. Thanks for bring it up.

Thanks,

Song Qiang

Michael Seidl

unread,

Jan 4, 2016, 2:57:30 AM1/4/16

to RSEG Users

Hi,

thanks a lot for your reply. Good to know that there is an 'easy' fix...Strange indeed :)