Domain for CenH3 peak

28 views
Skip to first unread message

Michael Seidl

unread,
Dec 31, 2015, 11:09:11 AM12/31/15
to RSEG Users
Hi,

I have a question that keeps me puzzled for a while. I have two sets of CenH3 (centromeric H3) pulldowns (a WT and KO strain and two biological replicates). I want to determine the CenH3 domain for each of my four samples. When I map the reads and visualize them, I get a single clear peak per chromosome. Both replicates for each of the sample are very similar. When I run RSEG on default parameters, for one of the samples (for both replicates independently) I retrieve a single domain per chromosome that perfectly overlaps my peak. However, when I run the second sample (same settings), hundreds of domains are called ( basically the entire chromosome), except the peak. It virtually looks like the mirror image of what I expect and what RSEG reported for the other sample. The samples are identical, however the second sample has approx 3x as many reads. Do you maybe have an idea what might be going on?

Kind regards
Michael

Song, Qiang

unread,
Jan 2, 2016, 2:27:32 AM1/2/16
to RSEG Users
Hi Michael,

Can you be more specific how you analyzed your data with RSEG?
Did you run rseg with the WT replicate one and replicate two respectively?
Or did you run rseg-diff mode 2 for WT with the KO as control?

Thanks,
Qiang


--
You received this message because you are subscribed to the Google Groups "RSEG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rseg-support...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Seidl

unread,
Jan 2, 2016, 6:00:10 AM1/2/16
to RSEG Users
Dear Qiang

sorry for my lack in specificity. I ran each sample and replicate independently. So, for example, 

sort -k1,1 -k2,2n -k3,3n -k6,6 -o A.sort.bed A.bed

~/data/software/rseg-rseg-0.4.9/bin/rseg -c Genome.size -o A.domain.bed -i 20 -v A.sort.bed


When I compare the output of the domains for the samples where the results match my expectation, I basically get more or less a single domain per chromosome located at the same position for each of the two replicates. For the other sample, I get again a very similar result per replicate, however no domain at the expected peak. Maybe the attached figure clarifies my problem. The upper four lanes show the mapped reads and the domain called by RSEG for sample A (2 replaces). The lower for lane show the results for sample B (2 replicates). As you can see, the domains are called along the entire chromosome, with the exception of the peak...

Thanks
Michael

 

Song, Qiang

unread,
Jan 3, 2016, 2:16:55 AM1/3/16
to RSEG Users
Michael,


The bottom two samples are the two KO replicates, right?
Compared with the top two sample, there indeed seems some peaks out of the most prominent center peak, which may be called as enriched domains by RSEG.
I am not exactly sure quite why the center peak is missed.
What is running log that RSEG printed for the bottom two samples?

Song Qiang




--

Michael Seidl

unread,
Jan 3, 2016, 4:04:35 AM1/3/16
to RSEG Users
Dear Qiang,

indeed, the bottom two are the KO samples. I do agree with you that it has a couple of additional domains, but surely not the entire chromosomes (as can be seen by the grey blocks that are taken 1 to 1 from the coordinates from the domain-bed file). 

This is the log of one of the replicates in the lower panel

[PROCESSING] B.sort.bed

[LOADING_DATA] chromosomes

[LOADING_DATA] reads

[LOADING_DATA] deadzones

[SELECTING BIN SIZE] bin size =  250

[ESTIMATIN PARAMETERS]


     DELTA  (PARAMS,MIX)

         1 nbd 67.68 0.1419      0.414 nbd 39.9 0.2243      0.586 

   0.02075 nbd 64.27 0.1658     0.4139 nbd 42.3 0.2559     0.5861 

  0.004018 nbd 61.74 0.1728     0.4185 nbd 43.95 0.2804     0.5815 

  ITR    F size    B size          F PARAMS          B PARAMS         DELTA

    1     131.4     89.91   nbd 61.7 0.1311  nbd 36.35 0.3216             1

    2     159.6     66.23   nbd 59.32 0.127  nbd 32.35 0.4274       0.01038

    3     229.5     56.16 nbd 55.41 0.09777   nbd 35.06 1.003      0.008281

    4     285.2     38.05 nbd 51.89 0.08352   nbd 47.73 2.082       0.02795

    5     226.3     26.05  nbd 50.56 0.0798   nbd 58.59 2.413       0.01462

    6     190.4     22.31 nbd 50.23 0.07783   nbd 61.27 2.369      0.001559

    7     173.1     20.88 nbd 50.11 0.07649   nbd 61.99 2.306     0.0001609

    8       164     20.18 nbd 50.05 0.07564   nbd 62.29 2.263       4.8e-05

    9     158.7     19.79  nbd 50.01 0.0751   nbd 62.45 2.236      1.81e-05

   10     155.5     19.56 nbd 49.99 0.07476   nbd 62.54 2.218     7.114e-06

   11     153.6     19.42 nbd 49.97 0.07455    nbd 62.6 2.207     2.847e-06

   12     152.3     19.33 nbd 49.96 0.07441     nbd 62.64 2.2     1.154e-06

   13     151.6     19.27 nbd 49.96 0.07433   nbd 62.67 2.196     4.716e-07

   14     151.1     19.23 nbd 49.95 0.07427   nbd 62.68 2.193      1.94e-07

   15     150.7     19.21 nbd 49.95 0.07423   nbd 62.69 2.191      8.01e-08

   16     150.5      19.2 nbd 49.95 0.07421     nbd 62.7 2.19     3.316e-08

   17     150.4     19.19 nbd 49.95 0.07419    nbd 62.7 2.189     1.376e-08

   18     150.3     19.18 nbd 49.95 0.07419   nbd 62.71 2.189     5.712e-09

   19     150.3     19.18 nbd 49.94 0.07418   nbd 62.71 2.188     2.374e-09

   20     150.2     19.17 nbd 49.94 0.07417   nbd 62.71 2.188     9.871e-10

FINAL ESTIMATES

---------------

Emission distributions

State 0: nbd 49.94 0.07417

State 1: nbd 62.71 2.188

Expected sizes

State 0: 150.2

State 1: 19.17

Transition probabilities

0.9933 0.006657

0.05215 0.9478

Domains file: B.domain.bed


This is one of the samples where the peak is as expected...

[PROCESSING] A.sort.bed

[LOADING_DATA] chromosomes

[LOADING_DATA] reads

[LOADING_DATA] deadzones

[SELECTING BIN SIZE] bin size =  400

[ESTIMATIN PARAMETERS]


     DELTA  (PARAMS,MIX)

         1 nbd 105.7 0.0908   0.003584 nbd 0.219 4.284     0.9964 

    0.5477 nbd 90.34 0.2679   0.004492 nbd 0.1922 2.583     0.9955 

   0.01794 nbd 78.89 0.4966   0.005275 nbd 0.1819 1.836     0.9947 

  ITR    F size    B size          F PARAMS          B PARAMS         DELTA

    1     21.58      3471  nbd 68.22 0.8319  nbd 0.1768 1.506             1

    2     21.12      3045     nbd 61.5 1.11   nbd 0.1746 1.39      0.003198

    3     22.09      3017   nbd 58.41 1.259  nbd 0.1738 1.358      0.000771

    4     22.34      2997    nbd 57.43 1.31  nbd 0.1735 1.351     0.0001203

    5     22.39      2988   nbd 57.13 1.326  nbd 0.1735 1.349      1.16e-05

    6      22.4      2984    nbd 57.04 1.33  nbd 0.1734 1.348     1.021e-06

    7      22.4      2983   nbd 57.01 1.332  nbd 0.1734 1.348     8.596e-08

    8      22.4      2983   nbd 57.01 1.332  nbd 0.1734 1.348     7.146e-09

    9      22.4      2983   nbd 57.01 1.332  nbd 0.1734 1.348     5.921e-10

   10      22.4      2983      nbd 57 1.332  nbd 0.1734 1.348     4.883e-11

   11      22.4      2983      nbd 57 1.332  nbd 0.1734 1.348     3.944e-12

   12      22.4      2983      nbd 57 1.332  nbd 0.1734 1.348     3.955e-13

CONVERGED


FINAL ESTIMATES

---------------

Emission distributions

State 0: nbd 57 1.332

State 1: nbd 0.1734 1.348

Expected sizes

State 0: 22.4

State 1: 2983

Transition probabilities

0.9554 0.04464

0.0003352 0.9997

Domains file: A.domain.bed


Thanks again!
Michael

Song, Qiang

unread,
Jan 4, 2016, 1:12:52 AM1/4/16
to RSEG Users
Michael,

Thanks for providing the log file.

With sample B, the enriched regions are indeed inverted.

In the final estimates,  the state 0 indicates the foreground regions (i.e., enriched regions), however with sample B,
its mean coverage (49.94) for the foreground state is lower than that (62.71) for the background.
The RSEG is initialized with higher coverage in the foreground, however during the training it diverge from the initialization, possibly due to the inaccurate initialization of the foreground domain size (foreground is initialized to 131.4 bins versus background  89.91 bins).
We may consider to force the foreground to have higher coverage in future updates to RSEG.
For now, I think you can just invert the domains from sample B to get truly enriched regions.

---------------

Emission distributions

State 0: nbd 49.94 0.07417

State 1: nbd 62.71 2.188

Expected sizes

State 0: 150.2

State 1: 19.17


This is an interesting behavior of RSEG we have not encountered. Thanks for bring it up.

Thanks,
Song Qiang

Michael Seidl

unread,
Jan 4, 2016, 2:57:30 AM1/4/16
to RSEG Users
Hi,

thanks a lot for your reply. Good to know that there is an 'easy' fix...Strange indeed :)

Kind regards
Michael
Reply all
Reply to author
Forward
0 new messages