Reed Sorensen
May 31, 2017
Unformatted text is from the reviewer. Bolded text is our response.
Response to Reviewer #1: Wojciech Chrosny
The paper presents a good overview of available microsimulation frameworks. The newly developed microsimulation framework (CEAM) seems to be well designed to utilize the available data from Global Burden of Disease (GBD) study and generate results that can be used to compare different treatment strategies. Perhaps in the future papers the team can discuss results of testing CEAM on different disease scenarios and try to compare results with other modelling tools and techniques (e.g. microsimulations in Discrete Event Simulation vs. Discrete Time Markov). It has been my experience that developing the same model in different tools, can uncover subtle mistakes in one or more approaches resulting in a more robust implementation.
We thank the reviewer for this positive assessment of our work, and appreciate the suggestion. We agree that these are valuable directions for future work.
Very specific improvement suggestion with regards to calculation of effective probability (Figure 3 line 13). The existing line
effective_probability = 1 - np.exp(-effective_rate)
seems to imply time unit of 1. However in Figure 2 there is a reference to t_timestep parameter. The time step should most likely be factored into the effective_probability calculation to avoid mistakes that will result when the assumption of unit time = 1 does not hold.
We made this change in the updated version.
Lastly having an detail knowledge of TreeAge Pro software framework (as director of software engineering at TreeAge Software, Inc.). I want to clarify that TreeAge Pro is available on Unix platforms and it does support distributed parallel computation for microsimulation models.
We made this change in the updated version.
Response to Reviewer #2: Wojciech Chrosny
Thank you for giving the opportunity to review this interesting paper presenting a new microsimulation modelling framework for CEA. I have some concerns regarding the search strategy and assessment of the modelling frameworks. Therefore, I will classify it as ‘Borderline’. I would recommend this paper for acceptance though provided that the authors will address my concerns.
I have no comments for the title, abstract and intro
Methods:
1. Please provide more clarity about your search strategy and the
inclusion criteria. Some are heavily subjective (i.e. “active user community”,
“quality of documentation”). It would be helpful if you describe how you make
decisions based on those. Furthermore, I do not understand why some criteria
were “preferred”.
We took out references to “preferred” criteria because they had no bearing on whether to reject a framework. We added an explanation about subjective criteria to the text: “Some of these criteria are necessarily subjective. For example, we defined ‘active user community’ as the hypothetical ability to ask an active user for guidance and ‘sufficient documentation’ as the ability to run the model without asking others for assistance.”
2. I agree that computation speed is an important aspect. However,
development speed is also important. It would be helpful if you also provide
some information on how fast the development of the ‘hello world’ model was
with each approach.
The
researchers conducting the review had disparate levels of coding skill, so we
could not compare development speed directly. However, this is an important
point worth considering in future work.
Results:
1. Why were 8 frameworks excluded from further testing? Please provide the
reason(s) for each one. Table 1 is not clear enough on stating the reason.
First of all, some cells have an ‘x’, and some don’t. For the coloured cells,
it is unclear what is the difference between the marked with ‘x’ and unmarked
cells. Second, it is not clear why some packages qualified for further testing.
For example, Anylogic and jamsim have both 2 yellows and 2 green cells, but one
qualified and one not.
We took out the ‘x’ marks from the colored cells of the table because they did not add information. You are correct to point out that JAMSIM/simario qualified for the “hello, world” phase, so we updated the table accordingly. We excluded the framework later because JAMSIM requires Java programming, and the simario R package (upon which JAMSIM depends) cites simulation with large populations as an area of future work. In general, we do not provide this level of specificity when explaining reasons for exclusion. Our purpose is not to single out certain frameworks and argue for why each one is deficient. Rather, we aim to describe the general process by which we arrived at an option that met our needs. We have edited our text to emphasize/acknowledge how our limited time and resources to understand the frameworks fully could produce imperfect assessments.
2. In the methods, you describe 2 assessment frameworks with specific
markers (i.e. computation times, lines of code, debugging time, etc.). I would
expect to see how each approach scored based on these assessment frameworks.
While I understand the difficulties of a systematic search for this topic, a
systematic approach to your assessment of the modelling frameworks is doable.
To address this point, we re-ran the models in a way that emphasizes comparability. We did not include this information in the first draft because there were subtle differences that made comparisons difficult. For example, we ran each model on different computing hardware (e.g. laptops, cluster nodes) and sometimes the input parameters were not identical (e.g. simulation duration). In the first draft, we reported computation speed as orders of magnitude. In the updated version, we include computation speed measured in seconds.
3. Please consider providing the ‘hello world’ code that you’ve used for
your assessment.
We included a link to a public GitHub repository in the updated version.
4. The “… the lack of individual-level coupling between the natural
history and intervention simulations” is not necessarily bad. It is just a
different approach that some may argue that capture real-life uncertainty
better. Depending on your philosophical views, an action in real life can have
unintended and unpredictable consequences that may alter the life course of
individuals; therefore your approach may give artificially narrow uncertainty
intervals because it keeps the life course of simulants fixed and only
considers the intended effects of the intervention. You may consider expanding
this part a bit in this paper to consider both approaches.
This is an insightful comment. We submitted another paper to SummerSim 2017 called “Untangling Uncertainty With Common Random Numbers: A Simulation Study” in which we discuss these issues at length.
5. I understand that the CEAM example is a ‘showcase’ for demonstration
purposes, so I will not comment on this as the detailed structure of the
microsimulation framework is not presented here. I am waiting to see the
technical specification of your model in a separate paper, and I welcome your
choice to make this project OSS.
We appreciate
the reviewer’s patience and hope to have an additional paper along these lines
drafted and available soon.
Discussion:
1. The second paragraph that discusses uncertainty is quite vague. I
cannot offer a suggestion for improvement because it does not link to the
previous results and I do not understand its purpose and function.
The paragraph addressed problems that some frameworks had in handling uncertainty. However, the two points we made are distinct and belong in separate paragraphs. Specifically, the part about draws from a parameter’s uncertainty distribution has more to with ease of integration with GBD results. The part about reducing stochastic uncertainty has more to do with the tradeoff with computation speed. We reorganized the material to make the purpose more clear.
2. Consider summarising the benefits of your approach and their practical
implications here. What does your approach brings to the end-user that was not
available so far?
We elaborated on the benefits of CEAM in the last paragraph before section 4.1. “Limitations and directions for further research”.
Response to Reviewer #3: Jacob Barhak
This is a very important paper and therefore needs publication. So the overall decision is Accept.
The importance lies in the fact that the authors did take the effort to use multiple modeling frameworks and evaluate them. I treat this as a review paper rather than a development paper of a new framework. This evaluation is priceless since it essentially provides a one stop shop for new modelers. I know for sure from first hand that this work was done faithfully since one of the authors contacted me is the last year and asked questions about MIST that I maintain. And I see how this effort can be quite extensive when going over several systems – this amount of work is a great contribution that should be acknowledged and reported for a greater benefit.
The paper can be published as is since the advantage of publishing it outweighs any consideration against publication. Although I am ok with the paper being published as it, I suggest that the authors take the time to improve the paper. The other two reviewers suggested some corrections and I think that it is prudent that the authors make an effort to address those suggestions. It will be nice if the new version will acknowledge reviewers effort with links to the public reviews.
I also have a few comments.
If possible, the authors may wish to reach out to developers of software they evaluated and check for correctness of facts. One reviewer already pointed out some issues specific to TreeAge, and I can point some details about MIST that authors may want to double check.
MIST is capable of running over Linux Cluster. See installation on https://github.com/Jacob-Barhak/MIST
Please also see the instructions provided to run over the cloud in https://htmlpreview.github.io/?https://github.com/Jacob-Barhak/MIST/blob/master/Documentation/MIST-over-the-Cloud.html
We updated this in the revised version.
If the requirements include a specific Unix/Linux distribution, please be explicit of the needs.
We assumed that a framework claiming Unix compatibility would be able to run on our particular Unix cluster. That assumption turned out to be true for all of the models we tested.
Considering that large variety of options available is it not a straightforward deduction why the authors decided to create their own framework. The justifications provided in the discussion have to be expanded. Speed is an issue that is a long discussion and has to do with tradeoffs. A dedicated system will for sure beat any general framework at a price of losing some general functionality – I am curious what balance point was selected for the design of the new system.
In general, we chose not to provide explicit reasons for excluding each framework. Our purpose is not to single out certain frameworks and argue for why each one is deficient. Rather, we aim to describe the general process by which we arrived at an option that met our needs. This approach acknowledges our limited time and resources to understand the frameworks fully, although we made a good faith effort to use each one.
Yet more importantly what level of correlation between parameters do you require? Can authors give an example of what you need that cannot be implemented by most systems? Unless I missed something, the code example given seem pretty standard and I want to see what authors are trying to do that breaks some systems.
In the Global Burden of Disease study, we represent parameter uncertainty as 1,000 draws from the uncertainty distribution of each parameter. When parameters for multiple conditions are estimated within the same statistical model, the draws are correlated. An example is that the three sequelae of ischemic disease (myocardial infarction, heart failure and angina pectoris) are modeled together. If we randomly sample from the uncertainty distribution around incidence for each sequela – the approach taken by most standalone software options -- the resulting uncertainty will be too large. It does not account for the fact that a draw with relatively high incidence of myocardial infarction will also tend to have a relatively high incidence of heart failure. Better is to run the model separately on each draw and aggregate the results at the end.
The authors are correct to discuss the limitations of mapping all the modeling frameworks available out there – there are countless number of systems and the team has done very much to bring all those together – which is great. I will point towards other options that the authors can choose to look at in the future. I suggest that the authors have a look at the Python library PyMC - it has MCMC code that may be reused. It will also be nice to look at discrete event simulation systems and agent based simulation systems – there are countless number of those and those can be adjusted to perform microsimulation. Finally, looking at the event driven design in the code, I suggest the development team check SBML – it has new capabilities that may help with microsimualtion in the new specifications. The last few recommendations may be beyond the scope of this paper, yet important in a larger context of modeling tools available.
We welcome these suggestions and will consider them in future work.
Hopefully the authors will choose to revise the paper to address the issues, although this is left at the level of suggestion.
Reed Sorensen
May 31, 2017
Unformatted text is from the reviewer. Bolded text is our response.
Response to Reviewer #1: Wojciech Chrosny
The paper presents a good overview of available microsimulation frameworks. The newly developed microsimulation framework (CEAM) seems to be well designed to utilize the available data from Global Burden of Disease (GBD) study and generate results that can be used to compare different treatment strategies. Perhaps in the future papers the team can discuss results of testing CEAM on different disease scenarios and try to compare results with other modelling tools and techniques (e.g. microsimulations in Discrete Event Simulation vs. Discrete Time Markov). It has been my experience that developing the same model in different tools, can uncover subtle mistakes in one or more approaches resulting in a more robust implementation.
We thank the reviewer for this positive assessment of our work, and appreciate the suggestion. We agree that these are valuable directions for future work.
Very specific improvement suggestion with regards to calculation of effective probability (Figure 3 line 13). The existing line
effective_probability = 1 - np.exp(-effective_rate)
seems to imply time unit of 1. However in Figure 2 there is a reference to t_timestep parameter. The time step should most likely be factored into the effective_probability calculation to avoid mistakes that will result when the assumption of unit time = 1 does not hold.
We made this change in the updated version.
Lastly having an detail knowledge of TreeAge Pro software framework (as director of software engineering at TreeAge Software, Inc.). I want to clarify that TreeAge Pro is available on Unix platforms and it does support distributed parallel computation for microsimulation models.
We made this change in the updated version.