BEB and NEB Result

91 views

Skip to first unread message

sunny kevin

unread,

Jul 13, 2022, 4:45:31 AM7/13/22

to PAML discussion group

Hello everyone,

Actually, I am a little confused. Do we have to compute the NEB and the BEB probabilities for each orthology?

I performed the branch-site models on specific branches (#1).

Output from codeml run -

out.mlc file -

248 Detailed output identifying parameters
249
250 kappa (ts/tv) = 2.46840
251
252
253 dN/dS (w) for site classes (K=4)
254
255 site class 0 1 2a 2b
256 proportion 0.00000 0.00000 0.23914 0.76086
257 background w 0.02808 1.00000 0.02808 1.00000
258 foreground w 0.02808 1.00000 999.00000 999.00000
259
260
261 Naive Empirical Bayes (NEB) analysis (please use the BEB results.)
262 Positive sites for foreground lineages Prob(w>1):
263
264 1 C 1.000**
265 2 T 1.000**
266 3 W 1.000**
267 4 F 1.000**
268 5 R 1.000**
269 6 T 1.000**
270 7 T 1.000**
271 8 C 1.000**
272 9 T 1.000**
273 10 W 1.000**
274 11 T 1.000**
275 12 S 1.000**
276 13 G 1.000**
277 14 G 1.000**
278 15 S 1.000**
279 16 S 1.000**
280 17 T 1.000**
281 18 A 1.000**
282 19 C 1.000**

283 20 A 1.000**
284 21 S 1.000**
285 22 G 1.000**
286 23 R 1.000**
287 24 P 1.000**
288 25 T 1.000**
289 26 E 1.000**
290 27 S 1.000**
291 28 S 1.000**
292 29 C 1.000**
293 30 S 1.000**
294 31 G 1.000**
295 32 A 1.000**
296 33 A 1.000**
297 34 G 1.000**
298 35 S 1.000**
299 36 V 1.000**
300 37 C 1.000**
301 38 G 1.000**

337 Bayes Empirical Bayes (BEB) analysis (Yang, Wong & Nielsen 2005. Mol. Biol. Evol. 22:1107-1118)
338 Positive sites for foreground lineages Prob(w>1):
339
340
341 The grid (see ternary graph for p0-p1)
342
343 w0: 0.050 0.150 0.250 0.350 0.450 0.550 0.650 0.750 0.850 0.950
344 w2: 1.500 2.500 3.500 4.500 5.500 6.500 7.500 8.500 9.500 10.500
345
346
347 Posterior on the grid
348
349 w0: 0.141 0.148 0.141 0.123 0.103 0.085 0.073 0.067 0.063 0.056
350 w2: 0.090 0.092 0.094 0.096 0.099 0.101 0.103 0.106 0.108 0.111
351
352 Posterior for p0-p1 (see the ternary graph)
353
354 0.007
355 0.007 0.009 0.015
356 0.007 0.010 0.017 0.018 0.021
357 0.008 0.011 0.019 0.020 0.022 0.021 0.018
358 0.009 0.013 0.021 0.022 0.022 0.020 0.016 0.015 0.011
359 0.010 0.015 0.024 0.024 0.020 0.018 0.013 0.012 0.009 0.009 0.006
360 0.013 0.019 0.025 0.023 0.016 0.015 0.010 0.009 0.006 0.006 0.005 0.005 0.004
361 0.017 0.024 0.021 0.019 0.010 0.010 0.006 0.006 0.004 0.004 0.003 0.003 0.003 0.003 0.002
362 0.025 0.027 0.011 0.011 0.005 0.006 0.003 0.004 0.003 0.003 0.002 0.002 0.002 0.002 0.002 0.002 0.002
363 0.012 0.012 0.003 0.004 0.002 0.003 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.001 0.002 0.001 0.001 0.001
364
365 sum of density on p0-p1 = 1.000000
366
367 Time used: 0:39

In my rst file -

dN/dS (w) for site classes (K=4)
5
6 site class 0 1 2a 2b
7 proportion 0.00000 0.00000 0.23914 0.76086
8 background w 0.02808 1.00000 0.02808 1.00000
9 foreground w 0.02808 1.00000 999.00000 999.00000
10
11 Naive Empirical Bayes (NEB) probabilities for 4 classes
12 (amino acids refer to 1st sequence: lumpus)
13
14 1 C 0.00000 0.00000 0.48878 0.51122 ( 4)
15 2 T 0.00000 0.00000 0.40988 0.59012 ( 4)
16 3 W 0.00000 0.00000 0.52935 0.47065 ( 3)
17 4 F 0.00000 0.00000 0.01894 0.98106 ( 4)
18 5 R 0.00000 0.00000 0.51067 0.48933 ( 3)
19 6 T 0.00000 0.00000 0.40988 0.59012 ( 4)
20 7 T 0.00000 0.00000 0.42107 0.57893 ( 4)
21 8 C 0.00000 0.00000 0.33620 0.66380 ( 4)
22 9 T 0.00000 0.00000 0.40988 0.59012 ( 4)
23 10 W 0.00000 0.00000 0.02790 0.97210 ( 4)
24 11 T 0.00000 0.00000 0.28071 0.71929 ( 4)
25 12 S 0.00000 0.00000 0.44899 0.55101 ( 4)
26 13 G 0.00000 0.00000 0.02666 0.97334 ( 4)
27 14 G 0.00000 0.00000 0.01144 0.98856 ( 4)
28 15 S 0.00000 0.00000 0.01397 0.98603 ( 4)
29 16 S 0.00000 0.00000 0.40878 0.59122 ( 4)
30 17 T 0.00000 0.00000 0.28071 0.71929 ( 4)
31 18 A 0.00000 0.00000 0.47605 0.52395 ( 4)
32 19 C 0.00000 0.00000 0.33620 0.66380 ( 4)
33 20 A 0.00000 0.00000 0.42819 0.57181 ( 4)
34 21 S 0.00000 0.00000 0.00069 0.99931 ( 4)
35 22 G 0.00000 0.00000 0.02184 0.97816 ( 4)
36 23 R 0.00000 0.00000 0.02522 0.97478 ( 4)
37 24 P 0.00000 0.00000 0.02310 0.97690 ( 4)
38 25 T 0.00000 0.00000 0.01847 0.98153 ( 4)
39 26 E 0.00000 0.00000 0.02106 0.97894 ( 4)
40 27 S 0.00000 0.00000 0.02409 0.97591 ( 4)
41 28 S 0.00000 0.00000 0.02613 0.97387 ( 4)
42 29 C 0.00000 0.00000 0.48878 0.51122 ( 4)
43 30 S 0.00000 0.00000 0.52182 0.47818 ( 3)
44 31 G 0.00000 0.00000 0.00002 0.99998 ( 4)
45 32 A 0.00000 0.00000 0.29580 0.70420 ( 4)
46 33 A 0.00000 0.00000 0.29580 0.70420 ( 4)
47 34 G 0.00000 0.00000 0.43515 0.56485 ( 4)
48 35 S 0.00000 0.00000 0.40878 0.59122 ( 4)
49 36 V 0.00000 0.00000 0.01685 0.98315 ( 4)
50 37 C 0.00000 0.00000 0.48878 0.51122 ( 4)
51 38 G 0.00000 0.00000 0.45295 0.54705 ( 4)

lnL = -558.649776
89
90 Bayes Empirical Bayes (BEB) probabilities for 4 classes (class)
91 (amino acids refer to 1st sequence: lumpus)
92
93 1 C 0.30360 0.33920 0.17070 0.18651 ( 2)
94 2 T 0.27729 0.36551 0.15595 0.20125 ( 2)
95 3 W 0.31656 0.32625 0.17795 0.17924 ( 2)
96 4 F 0.15498 0.48780 0.08747 0.26975 ( 2)
97 5 R 0.30943 0.33337 0.17396 0.18324 ( 2)
98 6 T 0.27729 0.36551 0.15595 0.20125 ( 2)
99 7 T 0.28093 0.36185 0.15800 0.19921 ( 2)
100 8 C 0.25257 0.39019 0.14212 0.21511 ( 2)
101 9 T 0.27729 0.36551 0.15595 0.20125 ( 2)
102 10 W 0.17334 0.46946 0.09785 0.25934 ( 2)
103 11 T 0.23233 0.41043 0.13078 0.22647 ( 2)
104 12 S 0.29016 0.35263 0.16317 0.19404 ( 2)
105 13 G 0.17053 0.47227 0.09626 0.26094 ( 2)
106 14 G 0.13459 0.50816 0.07590 0.28134 ( 2)
107 15 S 0.14239 0.50038 0.08034 0.27689 ( 2)
108 16 S 0.27702 0.36576 0.15581 0.20141 ( 2)
109 17 T 0.23233 0.41043 0.13078 0.22647 ( 2)
110 18 A 0.29902 0.34377 0.16813 0.18907 ( 2)
111 19 C 0.25257 0.39019 0.14212 0.21511 ( 2)
112 20 A 0.28347 0.35931 0.15942 0.19779 ( 2)
113 21 S 0.10928 0.53350 0.06149 0.29573 ( 2)
114 22 G 0.16132 0.48147 0.09106 0.26615 ( 2)
115 23 R 0.16874 0.47405 0.09526 0.26195 ( 2)
116 24 P 0.16388 0.47891 0.09251 0.26470 ( 2)
117 25 T 0.15391 0.48886 0.08687 0.27036 ( 2)
118 26 E 0.16081 0.48196 0.09078 0.26644 ( 2)
119 27 S 0.16643 0.47637 0.09395 0.26325 ( 2)
120 28 S 0.16952 0.47326 0.09570 0.26151 ( 2)
121 29 C 0.30360 0.33920 0.17070 0.18651 ( 2)
122 30 S 0.31421 0.32859 0.17664 0.18056 ( 2)
123 31 G 0.07964 0.56316 0.04444 0.31276 ( 2)
124 32 A 0.23790 0.40486 0.13390 0.22334 ( 2)
125 33 A 0.23790 0.40486 0.13390 0.22334 ( 2)
126 34 G 0.28547 0.35733 0.16053 0.19667 ( 2)
127 35 S 0.27702 0.36576 0.15581 0.20141 ( 2)
128 36 V 0.15059 0.49219 0.08499 0.27223 ( 2)
129 37 C 0.30360 0.33920 0.17070 0.18651 ( 2)
130 38 G 0.29129 0.35150 0.16380 0.19341 ( 2)

Is this the NEB and BEB output ?

Do I have to do any further analysis ?

How do I interpret the output by taking NEB and BEB into consideration ?

From outfile.mlc -

264 C 1.000**
265 2 T 1.000**
266 3 W 1.000**
267 4 F 1.000**
268 5 R 1.000**
269 6 T 1.000**
270 7 T 1.000**
271 8 C 1.000**

These are the positive sites (99 % probability).

Null - lnL - -558.649776, np - 22

Fixed - lnL - -559.449764 , np - 21

LTR = 2×(−559.449764−(−558.649776)) => -1.59

Is the LTR calculation correct ?

dof = 1(np1-np0 = 22 - 21) = 1

chi2 1 1.59

df = 1 prob = 0.207326134 = 2.073e-01

If the p value is significant, I can report the gene is positive selection.

Suggestions appreciated.

What are the other steps ?

Thanks

Kevin

Ziheng

unread,

Jul 22, 2022, 2:04:42 PM7/22/22

to PAML discussion group

our recommendation is to ignore the NEB results and focus on the BEB results.

in the output, there is one block with the heading "NEB". After that there is another block with the heading "BEB.

there are examples in the examples/ folder, which you can run first to get familiar with the interpretation of the output.