Decision on SIG-2026-0123

0 views

Skip to first unread message

MSOM Conference

unread,

May 8, 2026, 5:13:10 PM (7 days ago) May 8

to msom-confe...@googlegroups.com

08-May-2026

Re: SIG-2026-0123, "Stratify, Don't Personalize: Balancing Implementation Complexity and Outcome Disparity in Medical Care"

SIG Day Decision: Reject

Dear Author (this is to ensure anonymity):

We received many excellent submissions for the Healthcare Operations Management SIG-Day Conference. Unfortunately, we could not accept all of them to be included in the program, and we are sorry to say that your paper was not accepted to the SIG-Day conference.

If you also submitted an extended abstract of your paper to the main MSOM Conference, a decision on that submission will be made separately.

Sincerely,

Healthcare Operations;SIG Co-Chairs

MSOM Healthcare Operations Management SIG-Day Co-Chair

---------------------
Referee: 1
Strengths SIG Only: What are the most important strengths of this manuscript?
- The paper addresses an important problem in public health policy.
- The paper describes a flexible methodology for addressing the complexity/efficiency trade-off of personalized monitoring policy
- The method is illustrated on a case study from the head and neck cancer surveillance programme (in the US, I assume given the references).

Referee: 2
Strengths SIG Only: Below is the summary of their key contributions in my understanding.

One strength of the manuscript is the question it asks. The paper does not simply argue for more personalization; instead, it studies the more practical middle ground between one-size-fits-all care and fully individualized care. That is a useful and relevant framing, especially for an operations audience, because it connects equity goals to implementation burden in a very direct way. Another strength is the way the paper builds the problem. The idea of designing a small number of interpretable strata and then assigning a tailored policy to each group is sensible and easy to understand. The decision-tree structure also helps, because it gives the paper a form of stratification that clinicians could more realistically read and use than a black-box rule. The paper also has a methodological angle. It combines tree-based partitioning with a constrained POMDP framework, and that gives the work a clear technical contribution. However, some parts of these sections may later need further justification and clarifications.

A further positive aspect is that the paper does not stop at proposing the framework. It compares decision-aware proxies with similarity-based alternatives and shows that the choice of proxy really matters. That comparison is helpful, because it makes the paper more than just a new model; it also gives some insight into what kinds of stratification ideas work better in this setting. Their numerical study is also a strength. The head and neck cancer surveillance application is a meaningful setting, and the results are easy to follow. In particular, the paper does a good job showing that a relatively small number of strata can already achieve substantial disparity reduction, which supports the main practical message of the paper.

Finally, the manuscript provides some practical motivation. The paper keeps returning to the trade-off between equity and complexity, and that gives the work a useful managerial perspective. Even for a reader who may not agree with every modeling choice, the paper’s main message is easy to see and worth discussing.

Referee: 3
Strengths SIG Only: This paper proposes an interpretable optimal decision tree approach to partitioning a population into subgroups each with its own tailored policy. Various proxies are proposed for addressing different notions of disparity between subgroups.

Referee: 4
Strengths SIG Only: Novel methodology that combines MIO and POMDPs to develop personalized policies in healthcare delivery with the goal of minimizing disparities between groups.
The framework proposes to group patients based on covariates using regression trees. The trees are built in a decision-aware fashion, ot only based on similarity.

Referee: 1
Limitations: What are the limitations of this manuscript?
- Clarity of the scope: While I appreciate the positioning of the work within the broad “personalized treatment” area, the scope of the study/work is more narrow: the study looks specifically at disease monitoring, which has structural implications for the methodology (e.g., an underlying POMDP) that differentiates the work from other generic “personalized policy” work. Clearer, more accurate definition of the scope would help better appreciate the contribution. It would also help ensure claims are made at a breadth that is appropriate with what is demonstrated.
- Motivation for the algorithm design: The authors consider a tree-based model where each leaf corresponds to a policy. They identify a clear objective. The authors decide to compute the partitioning tree “exactly” using an MIO formulation but several surrogate/proxy objectives. Alternatively, they could have used their original objective but constructed the tree greedily via a recursive partitioning approach. As far as I can tell, the latter approach is more popular in the literature. The authors should better support (with rationale, theory, and/or numerical experiments).
- Guidance on algorithm use: Overall, the authors propose 8 proxy objectives. I think this is too much. I believe the authors should work on clarifying their proposed methodology to be more usable. (a) I understand that decision-aware and similarity-based proxy measures capture different aspects, so I could understand why people would like to optimize for a convex combination of one decision-aware and one similarity-based. The authors should investigate that possibility. (b) Within each category, however, I would encourage the authors to focus on one measure (and support their decision by evidence).
- Theoretical underpinning: The authors mention the trade-off between the number of strata K and the amount of data available. However, they ignore this issue by assuming the transition kernels are well estimated. It is a shame that the paper does not comment or address the issue of tuning the value of K to what is statistically possible to reliably estimate. In particular, several work in this area use statistical tests during partitioning to avoid incorrectly identifying statistically insignificant subgroups.

Referee: 2
Limitations: Below, I provided a few key limitations of their work.

One limitation is that the paper’s main optimization framework is quite indirect. The true problem is a nested and computationally hard one, so the paper ends up relying on proxy objectives rather than directly optimizing the actual disparity measure. The manuscript does show that proxies and realized disparities often move in the same direction, but it also shows that they do not always align well. This means the quality of the final stratification can depend heavily on the choice of surrogate rather than on the original objective itself.

A second limitation is that the paper builds on several strong structural assumptions at once. The framework depends on a particular disease progression model, a constrained POMDP formulation adopted from prior work, and a decision-tree restriction for the stratification step. Each choice is understandable on its own, but together they make the conclusions fairly model-dependent. So, one concern is whether the strong numerical gains reflect a robust insight about stratified care, or whether they are partly driven by the specific modeling architecture chosen here.

A third limitation is the simplification of the clinical heterogeneity. To keep the problem interpretable and computationally manageable, the paper coarsens several clinical variables into binary or grouped categories and then restricts the final partition to shallow trees. This helps implementation, but it may also remove important variation across patients and rule out better partitions that are not tree-based. In that sense, the paper may be achieving tractability partly by narrowing the space of clinically relevant heterogeneity too much.

My last comment is that the evidence remains largely model-based and metric-sensitive. The case study is interesting, but the claims are supported mainly through numerical experiments within the proposed framework rather than through external or practical validation. In addition, the results appear strongest under the weighted-average disparity measure that the paper emphasizes, while performance under other notions of equity, especially worst-case disparity, looks less uniformly strong. That makes the practical appeal of the method depend in part on which disparity metric a decision-maker views as most important.

Referee: 3
Limitations: To me the major missing piece of this paper is a thorough empirical investigation of computation time related aspects of the proposed method. While the paper does acknowledge at times known computational hardness results of various components involved, there aren't numerical experiments to really probe how scalable the proposed method is to datasets that researchers and domain experts might want to apply the proposed method to in high-stakes applications. I'd like to see more details on how long numerical experiments took in practice to run (wall clock time), discussion of solver time vs problem size, and how the various proxies differ in their impact on computation time.

As a smaller thing, one of the key differences pointed out by the authors regarding the proposed approach over the original optimal decision tree framework by Bertsimas and Dunn (2017) is the use of binarized covariates. As far as I'm aware though, this isn't really a new idea though. I'm under the impression binarization was also done by Hu, Rudin, and Seltzer (2019) in their optimal sparse decision trees paper. Perhaps it would be helpful to very clearly delineate which of the modifications are novel vs whether they have been done before in any existing papers.

Referee: 4
Limitations: Decision trees give policies that are hard to explain. For instance, when splits are based on disjoint regions. This makes them not very interpretable. A concern is that decision-aware trees may result in disjoint regions, which can compromise the approach's interpretability.

Referee: 1

Comments to the Author
(There are no comments.)

Referee: 2

Comments to the Author
(There are no comments.)

Referee: 3

Comments to the Author
(There are no comments.)

Referee: 4

Comments to the Author
(There are no comments.)

Reply all

Reply to author

Forward

0 new messages