Thanks for the reply - I'm glad you confirmed what I thought. Some further
comments...
If (S/s)(d/v) is small this makes nodes more likely to be recent, as
compared to the Yule model. It becomes more like the coalescent model. This
would be the situation if you sampled all or nearly all extant species in
some clade, and d/v is small, because there have been nearly as many
extinctions as speciations since the LCA. It might be appropriate if you
sampled all 20-odd crocodiles for example.
If (S/s)(d/v) is big, the opposite happens, and recent nodes are more
unlikely. It might be appropriate if you sampled 20 randomly chosen birds,
for example.
So roughly speaking, the models line up like
....coalescent....crocs....Yule....birds....
On your `wrinkle', a prior for the first node time (Yang & Rannala's t_1), I
agree they have no prior for t_1 and that this is problem. Yang & Rannala
take t_1 to be 1.0 and only make inferences about u and v relative to t_1.
The same seems to be true of Nee 2001 which is referenced in the BEAST code.
Nee's equation (3) is for the joint probability for t_2...t_s' and s, given
t_1 and v (where s' = s-1), ie
Pr(t_2,...t_s', s | t_1, v)
That does not seem the correct quantity for Bayesian phylogenetic analysis
to me. I think one should use
Pr(t_1,...t_s' | s, v)
if you regard s as chosen by the researcher, or possibly
Pr(t_1,...t_s', s | v) if you regard s as a random variable produced by an
evolutionary process. In any case, t_1 is surely a random variable in almost
all phylogenetic analyses. What's missing from Y & R and Nee is an
expression for Pr(t_1 | s, v).
In the case u=0, and no subsampling, ie the Yule model, you can calculate a
distribution for t_1. Assume the S=s species are sampled just before the
next speciation, ie just before there are s+1 species. Then t_1 = X_2 + ...
+ X_s, where X_n is exponential with parameter nv, and t_1 has density
proportional to
f(t) = exp(-2vt) * (1-exp(-vt))^(s-2).
Then multiplying this by Nee's equation (5), you get
Pr(t_1,...t_s' | s, v) proportional to exp(-v(2*t_1 + t_2 +...t_s')).
This is an identical formula to Nee's equation (3), but it is for a
different quantity. So, I think you have the right formula in BEAST, but
maybe the comment needs work ;-).
For the birth-death model, I *think* that given s,v,u, then t_1 has density
proportional to
(1-E)^(s-2) * ( 1 - (u/v)*E )^(-s-1) * E^2
where E = exp(t*(u-v)), with a more complicated expression if sampling is
also modelled. This should be multiplied by Yang and Rannala 97, equation
(3) to give a value for
Pr(t_1,...t_s' | S, s, v, u).
Regards
Graham