Some confusion in BVEP paper

149 views
Skip to first unread message

刘政

unread,
May 4, 2021, 10:01:20 AM5/4/21
to TVB Users
Hi

    I am reading the paper of BVEP, in section2.5 Probabilistic model,there is a generative model shown as follows:
BVEP1.jpg 
and in the paper, it says the x(t) represents the system's state, aka the x_1,i and z_i variables in the 2D Epileptor. 

u(t) is the external input, and it's not clear that what the external input is, or to put it another way, what exact variable u(t) stands for

Then about θ, the unknown parameters(also the hidden variables of the system?), it says θ =(x_t0,1,x_t0,2,…,x_t0,N,η_1, η_2,…, η_N,K,σ1, σ2), η means the excittability of each brain region, and K is the factor multiplies the connectome. σ1, σ2 is the variance of two kinds of noise, as to the x_t0,i,   does it mean the initial state of region i? If it is, what kind of property it is? The amplitudeof the signal of the region i?


And I notice the generative model is like the equation of the Karman filter.  Am I right?


Thanks a lot
Barry

meysam....@gmail.com

unread,
May 4, 2021, 10:34:42 AM5/4/21
to TVB Users
Hi,

u(t) stands for any arbitary external value to the system, such as a contant value or a pulse/stimulation e.g.,  the contant I_1 in Eq (3).

Yes, x_t0,i is the initial value for solving Epileptor equations.  In this study, the initial conditions were selected in the interval   [-3., 5.] for each state variable. For 2D Epileptor, one can set them around -2.5 and +3.5 for the variables x and z, respectively.

Eq (4) is a closed form represenation of state-space models, resembles in Kalman filter.

Best,
Meysam

刘政

unread,
May 4, 2021, 11:30:05 PM5/4/21
to TVB Users
Hi

    Thank you for your response, Meysam. However, I have a question, maybe pretty dump. As we already have the Epileptor model, and the model has equations, why would we still need a generative model? The Epileptor already has the equations to generate the signal of the region it belongs to, using gain matrix we can transfer it to the SEEG and do the comparison.

Thank you
Barry

刘政

unread,
May 5, 2021, 12:22:01 AM5/5/21
to TVB Users
Hi

    And in Equ(4), what does y(t) mean? I know x(t) means x_i,i, which stands for the fast charge of seizure, and z_i, which means the propagation. And as the paper says, h(·) is know as the field-lead matrix, aka the gain matrix, so the second equation in Equ(4) means with (x_1,1, x_1,2, ...x_1,I), (z_1, z_2, ..., z_i) and the field_lead matrix, wee can calculate the SEEG data and compare it to the observed SEEG data, in order to optimize the parameter, aka θ.

   If I am wrong somewhere, please correct me.

Thank you
Barry

meysam....@gmail.com

unread,
May 5, 2021, 5:37:25 AM5/5/21
to TVB Users
Hi,
y(t) stands for the observation we want to fit. And, h(t) is gain matrix to get y(t)=SEEG(t)=h(t)* x(t).
In this study, to avoid non-identifiability and other inversion problems, we consider h(t)=1, so the SEEG is equivalent to the source activity.
For real SEEG data, the gain matrix is a sparse rectangular low-rand matrix, which we are working on that to properly infer by HMC.

Best,
Meysam

meysam....@gmail.com

unread,
May 5, 2021, 5:41:23 AM5/5/21
to TVB Users
Hi,
Actually this is a good question. In this sudy, the Epileptor ingerated with priors consists the genrative model, and we only infer Epileptor model parameters beside the hyper-parameter such as noise level. 
There is another study (Sip et al, PLOS CB 2021), wich consider a threshold model ratheer tha Epileptor to infer the generative model from large data set.
Best,
Meysam

WOODMAN Michael

unread,
May 5, 2021, 9:25:45 AM5/5/21
to tvb-...@googlegroups.com
hi

just a small clarification: in the Bayesian MCMC community, a generative model connotes a model which maps the statistical modes of the parameter distributions to the modes of the data distributions in a one to one fashion. The Epileptor isn't generative in this sense by itself because you also need the gain matrix for sEEG, etc. Another sense in which the full network model may not be generative is that, lacking priors, the posterior may be multimodal when conditioned on a given dataset, which is a degeneracy that the priors are intended to eliminate.

cheers,
Marmaduke

刘政

unread,
May 5, 2021, 11:22:38 PM5/5/21
to TVB Users
Hi

    I supposed I lack some knowledge in Bayesian inference with generative models, so I still learn this. Apart from this, I found in the paper or the supplementary materials, you've got the ground-truth value of the parameters. like excitability of EZ PZ and HZ, global coupling, etc. How is the ground-truth value obtained?

Thank you
Barry

刘政

unread,
May 6, 2021, 1:07:07 AM5/6/21
to TVB Users
Hi

    After a little bit of searching, I suppose the approach of BVEP may be the Bayesian state-space model or Bayesian state‐space approaches, am I right? 

    If I'm wrong, welcome to correct me. I don't want to go in the wrong direction, for I find a lot of concept in Bayesian approaches, they're similar but in someway different.

Thank you
Barrry

WOODMAN Michael

unread,
May 6, 2021, 3:10:59 AM5/6/21
to tvb-...@googlegroups.com
hi


On 6 May 2021, at 07:07, 刘政 <lz02...@gmail.com> wrote:

the approach of BVEP may be the Bayesian state-space model 

Yes, it is a Bayesian nonlinear state-space model.  The state-space model itself is in common with the main TVB library, however the distributional aspects (priors, posteriors, etc) are the Bayesian additions.

刘政

unread,
May 6, 2021, 4:04:35 AM5/6/21
to TVB Users
Hi

    Sorry, I just figured out why using synthetic data and where the ground truth comes from. Basically, the empirical data's ground truth is unavailable, so we cannot know whether the BVEP is good or not, so using synthetic data simulated from TVB could tell us the ground truth and evaluate it. 

    In the beginning, I was thinking of using the SEEG from patients who have done the surgery, so the outcome of the surgery is clear, whether seizure-free or not. If the patient is seizure-free, and the regions which the surgeons cut off or do the chemonucleolysis is consistent with the result of BVEP, or not seizure-free and the regions are not consistent with the surgery plan like maybe the PZ part is cut off, can someway prove the validity of BVEP. 

    I think using synthetic data and using the SEEG from patients who have done the surgery could both be used. What's your opinion?

Thank you
Barry

meysam....@gmail.com

unread,
May 6, 2021, 4:27:40 AM5/6/21
to TVB Users
The ground-truth value presented in appendix are based on the dynamical properties of the system such as bifurcation diagram (Proix et al 2014, 2017).

meysam....@gmail.com

unread,
May 6, 2021, 4:38:25 AM5/6/21
to TVB Users
Hi,
This study is validation on synthetic data and and the details of the method. We have a paper under review exploring the SEEG empirical data and the agreement/mismatch with clinical hypothsis for a retrospective cohort.

Best,
Meysam

WOODMAN Michael

unread,
May 6, 2021, 5:51:31 AM5/6/21
to tvb-...@googlegroups.com
Hi

Yes, your description is consistent with the model evaluation done in the various papers with the VEP models. 

cheers,
Marmaduke

--
You received this message because you are subscribed to the Google Groups "TVB Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tvb-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tvb-users/6b859bf6-745f-4099-a2c3-2987c53212b9n%40googlegroups.com.

刘政

unread,
May 6, 2021, 10:24:22 AM5/6/21
to TVB Users
Hi

    In the paper of BVEP, the σ and σ’ are considered as two hyperparameters. As I know, the hyperparameter is something that people set manually. Why are these two hyperparameters? 
    
Thank you
Barry

meysam....@gmail.com

unread,
May 6, 2021, 10:28:33 AM5/6/21
to TVB Users
Since it is challenging to infer them! In this study, we have treated them as unknown.

Viktor Sip

unread,
May 6, 2021, 11:10:15 AM5/6/21
to tvb-...@googlegroups.com

Forgot to reply to list previously:

What you describe is the terminology used commonly with neural networks. In the probabilistic modeling, however, hyperparameters typically refer to top-level parameters in hierarchical models, that is, parameters of the distributions of other parameters. These are part of the model and are inferred too.

Hi
    So it means like if we have a Gaussian distribution, The mean of it follows a N(a, b) and the a and b are hyperparameters.

Thank you 
Barry

Exactly. If you want more, search Stan documentation for hierarchical models, or I like to recommend Richard McElreath's book Statistical Rethinking which also goes into this topic.

Viktor

To view this discussion on the web visit https://groups.google.com/d/msgid/tvb-users/365bb66a-282a-4acc-b3a8-1ceddf8dc4ccn%40googlegroups.com.
-- 
Viktor SIP
INS - Institut de Neurosciences des Systèmes
UMR INSERM 1106, Aix-Marseille Université
vikto...@univ-amu.fr
http://ins.univ-amu.fr/ 

刘政

unread,
May 7, 2021, 3:35:56 AM5/7/21
to TVB Users
Hi

  Thank you so much for your explanation and recommendation.

Thank you
Barry

刘政

unread,
May 7, 2021, 6:26:59 AM5/7/21
to TVB Users
Hi

    After some learning, I would like to talk about my understanding within the inference.

    So what we have is SEEG data, a generative model based on Epileptor and gain matrix to generate SEEG(of course add some noise follow Gaussian), MCMC(NUTS), the prior of the parameters, and the likelihood function.

    As to the likelihood function, based on the distribution we give the parameters of noise which is v(t) in Equ(4) is Gaussian distribution, so the likelihood function is something like this(If I'm wrong here please correct me):
微信图片_20210507180145.png
    and r_i means residuals between observed data and generated data. σ' is the standard deviation of v(t), aka the noise in the measurement equation. and N is the length of observed data.
    
    So using MCMC is because we cannot calculate the P(evidence), which is the denominator of the Bayesian function for the parameter θ has a large dimension so the integration is often unable to be calculated. So we ignored the P(evidence), we just use the likelihood and our prior to obtain the distribution of θ given observed data.

    And the basic steps of MCMC are: 

    1. We start at an initial sample of θ from the parameter space of θ, let's call it θ_0, now we also have P(θ).

    2. Then we generate SEEG using θ_0 and the generative model(need integration to get x(t) where the Euler-Maruyama integration scheme is used), the calculate the likelihood using the likelihood function above.

    3. Then we multiply the P(θ) and the result of likelihood. Now we get an unnormalized posterior P_0.

    4. We jump to the next sample of θ, called θ_1, and then calculate the   posterior P_1

    5. If P_1 is bigger than P_0, then we accept θ_1, then move on; otherwise, we reject it, and set the second element of the Markov chain still θ_0. Until the next posterior is bigger than the former one.

    6. After 10e4 steps maybe, the samples of θ will be confined to the true distribution that we really want like this:
微信图片_20210507182200.png
    In this way, we've got the result. That's just my understanding now. I still need to dig into the centered parameterization versus non-centered parameterization. So far, I know that the parameter space exploration of the centered parameterization model often not efficient for it spend too much time exploring the parameters highly correlated which means the waste of time.

    If I'm wrong somewhere, please correct me. I cannot be thankful enough.

Thank you
Barry

WOODMAN Michael

unread,
May 7, 2021, 7:39:09 AM5/7/21
to tvb-...@googlegroups.com
Hi

Most of this summary is correct.  Two points, inline below:

On 7 May 2021, at 12:26, 刘政 <lz02...@gmail.com> wrote:

    5. If P_1 is bigger than P_0, then we accept θ_1, then move on; otherwise, we reject it, and set the second element of the Markov chain still θ_0. Until the next posterior is bigger than the former one.

If P_1 is less than P_0, then θ_1 may still be accepted with a probability of 0.05 or similarly small value (which is tunable).  This is a key element that allows MCMC to explore the tails of the distribution and not just the mode.  If, at every step, you only go up, it's an optimization, not an MCMC chain.  

 the parameter space exploration of the centered parameterization model often not efficient for it spend too much time exploring the parameters highly correlated which means the waste of time.

There is a reason for this: NUTS approximates the curvature of the posterior by either a diagonal or full dense covariance matrix, or in other words, it assumes the posterior is sufficiently ellipsoidal.  If this is not the case, NUTS degenerates to standard random-walk MCMC.  The non-centering parameterization reshapes the the posterior from something curved to somethign ellipsoidal so that the NUTS sample can move in uncorrelated parameter space over the ellipsoid. 

In the case of VI, if a Gaussian variational distribution is assumed, then the inference process reduces to optimizing the fit of an ellipsoid to the true posterior, so non-centering is beneficial there as well. 

The BVEP repository has models in the two forms, so you can compare them. 

cheers,
Marmaduke

刘政

unread,
May 7, 2021, 10:26:25 AM5/7/21
to TVB Users
Hi

   Thank you for your correction and explanation, sir! I'll keep learning the related knowledge!

Thank you
Barry

刘政

unread,
May 9, 2021, 6:59:01 AM5/9/21
to TVB Users
Hi

    I've got some questions with regard to Section Evaluate posterior fit.

    1. I want to make sure that  I understand the function of the posterior z-scores and the posterior shrinkage
        For the posterior z-scores, it shows us the distance between estimated-mean and the ground truth. 
        And for the posterior shrinkage, it's like this figure. The bigger the shrinkage is, the more confident we are about the result. Like the prior is a fat distribution, and the posterior is thinner than the prior. 
    2. As to the posterior shrinkage, why using one minus the ratio?
    3. How do I read the figure of the posterior z-scores against the posterior shrinkage? I now z-scores should be small which means the estimated is close to the ground to the ground truth when generating the generated data, and as to the posterior shrinkage, we want σ_post is smaller than σ_prior, som the ratio should be small, ideally, so s should be big. But how do I read Fig3.C?

Thank you
Barry

刘政

unread,
May 9, 2021, 10:53:12 AM5/9/21
to TVB Users
Hi
    
    I still not quite understand why σ and σ' are the hyperparameters. As in hierarchical models, if data follows N(μ,σ ) and μ follows N(μ_T, σ_T), we call the μ_T and σ_T hyperparameters.
equ5.png
    But in Equ(4), or Equ(5), the  σ  is just the standard deviation of the noise.
    Or is it because that if Equ(5), the Θ in dt(f(x(t), u(t), Θ)) follows a prior Gaussian distribution. and the  σ  is in a higher level which will disturb the probability of x(t) transferring to x(t+dt).
    And as to σ', it will decide y(t), so it's high-level.

Thank you 
Barry

WOODMAN Michael

unread,
May 10, 2021, 5:08:54 AM5/10/21
to tvb-...@googlegroups.com
Hi

Hyperparameters here means those which are not fit with Stan or PyMC3 directly, but tuned outside of Stan, because they cannot be inferred easily. 

cheers,
Marmaduke

WOODMAN Michael

unread,
May 10, 2021, 5:09:40 AM5/10/21
to tvb-...@googlegroups.com
hi

I would suggest as background reading the following


which should clear up your questions,

cheers,
Marmaduke

meysam....@gmail.com

unread,
May 10, 2021, 5:30:26 AM5/10/21
to TVB Users
Hi,

The posterior z-score quantifies how much the posterior distribution encompasses the ground-truth, while the posterior shrinkage quantifies how much the posterior distribution contracts from the initial prior distribution (see https://arxiv.org/pdf/1803.08393.pdf, page 18).

The concentration of estimation towards large shrinkages indicates that all the posteriors in the inversion are well-identified, while the concentration towards small z-scores indicates that the true values are accurately encompassed in the posteriors. Therefore, by plotting the posterior z-scores (vertical axis) against the posterior shrinkage (horizontal axis), the distribution on the bottom right of the plot implies an ideal Bayesian inversion. In this plot, bottom left indeicates poorly Identifed posterior, top left indicates a poorly-chosen prior and top left indicates overfitting.

Using minus one in shrinkage is just rescaling to get ideal fit for getting ideal Bayesian version at z-score close to zero and shrinkage close to 1.


Best,
Meysam

刘政

unread,
May 11, 2021, 9:09:56 AM5/11/21
to TVB Users
Hi

   Thank you very much for the explanation and recommendation. I was reading about the non-centered reparameterization part, and I have some confusion about Equ(6).

 Equ6.jpg
    To transformer centered parameterization, x(t+dt) should be equal to μ +  σ * x(t+dt)_unit, and x(t+dt)_unit ~ N(0, 1), here the x(t+dt)_unit is the x*(t+dt) in the Equ(6).
    And this μ, I think, should be x(t) + dt*f(x(t), u(t), θ). Why in Equ(6) is f(x(t), u(t), θ) + dtdx? And what does dt*dx mean? dx = dt * f(·), so dt*dx equals (dt)^2 * f(·).

Thank you 
Barry
 

刘政

unread,
May 12, 2021, 5:31:13 AM5/12/21
to TVB Users
Hi

    I think I need to restate my question. I was reading some tutorials about centered and non-centered models.

    here is the centered models ' form:
    u ~ N(0, 1)
    σ ~ N(0, 1)
    y ~ N(u, σ)

    The non-centered form becomes:
     u ~ N(0, 1)
     σ ~ N(0, 1)
     y_unit ~ N(0, 1)
     y = u + σ * y_unit 

   So here in BVEP, the Equ(5) is this:
equ5.jpg
    Equ(6) is like this:
Equ6.jpg
    So in Equ(5), x(t) + dtf(x(t), u(t), θ) is the mean of the distribution.
    According to the non-centered form, x(t+dt) = x(t) + dtf(x(t), u(t), θ) + σx*(t+dt), but in the paper is f(x(t), u(t), θ) + dtdx, I don;t understand here. And what does dtdx mean? dx is the signal of the region, dt means time, what does signal multiplie time mean?

Thank you 
Barry

WOODMAN Michael

unread,
May 12, 2021, 11:03:03 AM5/12/21
to tvb-...@googlegroups.com
Hi

There appears to be a typo and Eq 6 should read

x(t + dt) = x(t) + dt f(x(t), u(t), theta) + sigma x*(t + dt)

where x* denotes the first term in that eq, not multiplication. 

Cheers

Marmaduke

On 12 May 2021, at 11:31, 刘政 <lz02...@gmail.com> wrote:

Hi

    I think I need to restate my question. I was reading some tutorials about centered and non-centered models.

    here is the centered models ' form:
    u ~ N(0, 1)
    σ ~ N(0, 1)
    y ~ N(u, σ)

    The non-centered form becomes:
     u ~ N(0, 1)
     σ ~ N(0, 1)
     y_unit ~ N(0, 1)
     y = u + σ * y_unit 

   So here in BVEP, the Equ(5) is this:
<equ5.jpg>
    Equ(6) is like this:
--
You received this message because you are subscribed to the Google Groups "TVB Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tvb-users+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages