Questions about "flux" in direct.h5 datasets

瀏覽次數:192 次
跳到第一則未讀訊息

Edis Jakupovic

未讀,
2021年12月11日 晚上11:04:502021/12/11
收件者:westpa-users
Hello everyone,

Are the "fluxes" in the direct.h5 datasets (target_flux_evolution, conditional_flux_evolution, total_fluxes, etc.) probability fluxes, or raw transitions/time fluxes?

I'm a little confused, mostly because of this sentence in the Na-Cl tutorial (https://github.com/westpa/westpa/wiki/Na--Cl--Association-with-Gromacs-2018.2): 

"Unlike the target_flux_evolution, the rate_evolution takes into account the amount of probability in the bound and unbound states. Since we have performed a steady state simulation in which all trajectories begin in the unbound state, the unbound → bound rate should be the same as the flux into the bound state."

If we take the flux of probability, J, from state A into state B to be J_AB = k_AB * P_A, and A=unbound and B=bound, then is the above just saying P_A = 1, therefore J_AB = k_AB? Is this just a special case due to the fact that we're only considering 2 states, {P_A, P_B}, and since it's in steady state P_B=0 => the probability of the "bins" is always {1, 0}. Is this always true? For instance, if we considered intermediate distances between bound and unbound where the probabilities are {P_A, P_B, P_C, P_D}, where A, B, and C are unbound states. Then even if we initialized the system with {1/3, 1/3, 1/3, 0}, can we draw the same conclusion since the probability flux into D = (J_AD + J_BD + J_CD) = rate into D?


Another question - 
Regarding the "These are **not** normalized by the population of the initial macrostate." disclaimer for the conditional_fluxes dataset, what does that mean, exactly? And how would you go about normalizing it? The w_kinetics documentation (https://westpa.readthedocs.io/en/latest/documentation/cli/deprecated/w_kinetics.html) says:
"Because state-to-state fluxes stored in this file are not normalized by initial macrostate population, they cannot be used as rates without further processing. The w_direct kinetics command is used to perform this normalization while taking statistical fluctuation and correlation into account."

So, if I generate my my assign.h5 and direct.h5 files with the "w_ipa -ao" command, is this automatically handled?

Sorry for the long-windedness of the question. I appreciate any help!

Thank you,
Edis


Anthony Bogetti

未讀,
2021年12月12日 清晨7:53:002021/12/12
收件者:westpa...@googlegroups.com
Hi Edis,

The total_fluxes and conditional_fluxes datasets (as well as the corresponding evolution counterparts) are in units of flux per tau (the length of each weighted ensemble iteration). total_fluxes account for all fluxes regardless of direction while conditional_fluxes take directionality into account. target_fluxes are for the flux into the recycling states you defined for your simulation. These fluxes are “raw”. It sounds like the rate_evolution dataset normalizes (divides) by the state populations as an extra step. The state populations will not always be 1. For steady state they should be constant, but if you have intermediate states between bound and unbound the population of your inbound state will not always be one. If you had exclusively two states (unbound and bound) then it would be 1 or very close to 1 at steady state. It all depends on how you define your states in west.cfg before running w_ipa.

When w_ipa produces your direct.h5 file, the datasets labeled as ‘fluxes’ are not normalized by the populations and I recommend you do that step yourself. You can find these populations in the assign.h5 dataset called ‘labeled_state_populations’. For every iteration, this gives the populations of each state you defined in the west.cfg analysis section. Added together, the per iteration populations will be out of 1.

Let me know if you have any more questions! I have also recently been working through all this and am aiming to provide more documentation on this in the near future.

Best,
Anthony

--
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/westpa-users/e0246c43-289c-4277-bf5a-06644a6010aen%40googlegroups.com.

Edis Jakupovic

未讀,
2021年12月12日 下午4:55:302021/12/12
收件者:westpa-users
Hi Anthony,


Thanks for the reply. I'm beginning to understand more thorougly, but there are still some things I'm confused about. Can I run through an explicit example?

Suppose we run a steady state run with 3 states: A=unbound, B=intermediate, and C=bound.

We initialize the system with a probability distribution {P_A, P_B, P_C} = {1, 0, 0}, where P_A(t) + P_B(t) + P_C(t) = 1 (with P_C(t)=0 since it's in steady state).

Let's say we run it for 10 iterations, with 5 walkers per bin (so nsegs=10 in steady state), and we decide that after 10 iterations that we've reached steady state and we end with a final bin P_i distribution of {2/3, 1/3, 0}.

In this example, we have 6 rates: k_AB, k_BA, k_AC, k_CA, k_BC, k_CB, where these represent the sum of the actual walker probability into a bin / tau, and the flux of probability, J, is J_ij = k_ij * P_i (the sum of walker probabilities times the probability they were in that state to begin with / tau)
--- Quick questions about the above - does the number of transitions take into account EACH progress coordinate calculated in a single iteration? For example, if 5/pcoord_len pcoord values made their way into the next bin, but the final pcoord (the one that WESTPA uses to determine whether to create or destroy trajectories) did not transition, is the # of transitions for this iteration 5/tau, or 0/tau? And about the rate constants, k_ij, this is in itself probability/tau, right? And these probabilities come from the individual walker probabilities?

So for the datasets, I'll refer to "raw flux" flux as J_ij = k_ij since we're not considering the parent bin's probability, and "probability flux" as J_ij = k_ij * P_i, where P_i in this example is {2/3, 1/3, 0} in steady state.

So then, what are the datasets in direct.h5 giving? It sounds like most of them are giving "raw flux". Here's how I see it at the moment:

- conditional_fluxes: sum of probabilities of walkers that enter bin j from bin i in a single iteration
- conditional_flux_evolution: same as above but averaged over sum windows
- total_fluxes: not 100% sure... You mentioned it takes into account all fluxes regardless of direction, the documentation says "Total flux into a given macrostate.", so is the documentation just wrong? Or am I misunderstanding 
- target_flux_evolution: same as above, but averaged over some windows
- rate_evolution: Now we multiply the conditional fluxes by the probability they were in that state to begin with, so this gives the J_ij = k_ij * Pi  (is this correct?) 

Perhaps I should be looking at using w_fluxanl as well? 

Please let me know if I've gotten anything wrong

thanks again!
Edis

Anthony Bogetti

未讀,
2021年12月14日 下午4:26:482021/12/14
收件者:westpa...@googlegroups.com
Hi Edis,

I apologize for the delay; I just wanted to double check and make sure I could correctly answer your questions.

Does the number of transitions take into account EACH progress coordinate calculated in a single iteration?

Yes, the tracing code calculates everything (fluxes, event durations) at sub-tau resolution on a per-walker basis. Even if a walker fails to be recycled because it enters and then leaves a target region, that entrance event will be picked up and included by the tracing code in w_direct.

So then, what are the datasets in direct.h5 giving? It sounds like most of them are giving "raw flux".

That is correct.  All of the fluxes are "raw" (un-normalized) except for that rate_evolution data set you mentioned (which are the fluxes divided by the labeled state populations). Your equation J_ij = k_ij * Pi is correct.  For detailed definitions of each data set in the direct.h5 file, please see the commented code which is correct:
It may also be helpful to view the assign.h5 code comments which contain information about the labeled populations, if you want to manually convert between raw flux and the rate constant:
I don't think there should be any need to use w_fluxanl as w_direct should make all possible flux datasets.

Best,
Anthony

Edis Jakupovic

未讀,
2021年12月21日 凌晨12:01:232021/12/21
收件者:westpa-users
Hey Anthony, 

No need to apologize, I appreciate your time! The code was really helpful to look through as well. 

One more quick question regarding steady state WE - when a walker is recycled, its probability is redistributed into a random bin that corresponds to one of the initial states (or a new trajectory is created from one of the initial states if that bin is unpopulated), right?  Is there a way to fix the recycling to only feed into a particular bin? 

For example, say I have bins {A, B, C, D, E} and I want to calculate the MFPT for a system starting in the state that corresponds to bin A to reach a target state that corresponds to bin E. If I initialize my system with walkers that are solely in bin A, then all of the probability recycled should go back into bin A. However, if I wanted to run a repeat simulation at a different temperature, it would converge much faster if I seed the new run with trajectory points that lie in all bins A, B, C, and D with total bin probabilities that are close to their steady state, converged values. But if I initialize the run with walkers in all bins, the probability being recycled would not only go to bin A, but also randomly to bins B, C, and D, which I think would increase the probability flux going into the target state versus the run where all probability is recycled into bin A, giving a shorter calculated MFPT. Is this analysis correct, or will it not matter which bin the probability is being recycled into, as long as it's going to a state that's not the target state and the system has converged to steady state? I guess the short version of my question is: given two identical runs have reached steady state, will they produce different MFPT's if the bins they recycle probability into are different?

thanks again,
Edis

Daniel Zuckerman

未讀,
2021年12月28日 下午6:00:362021/12/28
收件者:westpa-users、John Russo

Edis, this is a good question, and your intuition is correct.  First, if you initialize a run closer to steady-state, that should speed up convergence of your MFPT estimate.  This is the idea of our haMSM approach.  Second, if you recycle to *all* states except the target, that will speed up the kinetics.  To see this most easily, let's say state D is right next to and surrounds the target whereas A is distant from the target, then recycling some trajectories to D will clearly speed up time to target compared to only recycling to A.


I don't know the details of how WESTPA treats this situation, but I remember discussing the issue with John Russo (a student in my group) and he may have a working solution for you.


--Dan


From: westpa...@googlegroups.com <westpa...@googlegroups.com> on behalf of Edis Jakupovic <ejak...@asu.edu>
Sent: Monday, December 20, 2021 9:01 PM
To: westpa-users
Subject: [EXTERNAL] Re: [westpa-users] Questions about "flux" in direct.h5 datasets
 

Edis Jakupovic

未讀,
2021年12月29日 下午3:29:502021/12/29
收件者:westpa-users
Hi Dan, 

You're right, it's very clear when you think about it like that - thanks!

And yes, if you could get me in touch with John Russo that would be great. My email is ejak...@asu.edu

thanks again,
Edis

rus...@ohsu.edu

未讀,
2021年12月29日 下午4:26:392021/12/29
收件者:westpa-users

Hi Edis,

A little bit of details on how recycling works in WESTPA: WESTPA 1.0 uses the initial states for recycling - each initial state is assigned a probability, and when a walker reaches the target, an initial state is chosen according to their relative probabilities. Then, the walker is reinitiated from that. Note that every initial state is fair game to be used here. 

In WESTPA 2.0, we’ve added “start states” in addition to initial states. When initializing a new WE run, walkers are seeded using both start states and initial states. However, *only* initial states are used for recycling — walkers will not be recycled into start states. So, say you define initial states A and B, and start states C and D. This allows you to initialize your WE run withwalkers in  bins A B C D, but then only recycle into A and B. 

Happy to correspond more with you if you have further questions! Hope this helps. 

Edis Jakupovic

未讀,
2021年12月29日 下午4:29:022021/12/29
收件者:westpa-users
Ahh ok, that's exactly what I need to know, thank you so much! No more questions from me at the moment
回覆所有人
回覆作者
轉寄
0 則新訊息