Decision on SIG-2026-0157

0 views

Skip to first unread message

MSOM Conference

unread,

May 8, 2026, 5:10:50 PM (7 days ago) May 8

to msom-confe...@googlegroups.com

08-May-2026

Re: SIG-2026-0157, "Constraint-Aware Self-Improving Large Language Model for Clinical Role Model Generation"

SIG Day Decision: Reject

Dear Author (this is to ensure anonymity):

We received many excellent submissions for the Healthcare Operations Management SIG-Day Conference. Unfortunately, we could not accept all of them to be included in the program, and we are sorry to say that your paper was not accepted to the SIG-Day conference.

If you also submitted an extended abstract of your paper to the main MSOM Conference, a decision on that submission will be made separately.

Sincerely,

Healthcare Operations;SIG Co-Chairs

MSOM Healthcare Operations Management SIG-Day Co-Chair

---------------------
Referee: 1
Strengths SIG Only: See attached PDF

Referee: 2
Strengths SIG Only: A major strength of this paper is that it extends a coherent and impressive research program on trustworthy personalized treatment planning while also advancing a broader agenda at the intersection of AI, optimization, and healthcare decision support. The paper fits naturally within the larger body of work on modeling patient treatment trajectories that are sequential, clinically meaningful, and operationally implementable. Against that backdrop, the present study makes an important step by moving beyond optimization over only observed trajectories and instead generating new candidate trajectories under data scarcity, while preserving the same emphasis on rigor, safety, and deployability through a transparent optimization-based verifier and a self-improving active-learning loop. The paper is particularly strong in how it combines methodological sophistication with practical relevance, integrating formal guarantees such as regret-style learning results and patient-specific notions like hitting time to safety with careful empirical validation on large clinical datasets and clearly patient-centered objectives based on reliability and effort-to-change. More broadly, its emphasis on trustworthy AI, distributional robustness, and actionable support for underrepresented and data-sparse patient populations makes the contribution feel like a natural and ambitious extension of the trajectory-based personalized medicine program and a meaningful effort to translate advanced analytics into deployable clinical decision tools rather than stopping at technical novelty alone.

Referee: 3
Strengths SIG Only: The paper is technically strong; it applies OR tools in a relevant context. The integration of robust optimization with LLM fine-tuning is technically appealing. The use of real data is good, but I am not sure if the empirical contextualization reflects what could happen in real clinical practice/potential use of LLMs.

Referee: 1
Limitations: See attached PDF

Referee: 2
Limitations: A major limitation of the paper is that it does not convincingly establish that an LLM is the appropriate methodological engine for the problem being studied. The core task is formulated over structured patient and CRM feature vectors, with reliability and EtC defined on these structured states, and the empirical implementation relies on 20 tabular clinical variables from ACCORD and NHANES that are simply serialized into prose prompts. The fact that the best result is achieved by an 8B Qwen3 model, which is small by current open-model standards, weakens the claim that the task requires substantial general or clinical language understanding. Instead, it suggests that the substantive intelligence resides primarily in the structured verifier and iterative search procedure, with the LLM serving mainly as a lightweight generator of candidate serialized records. The manuscript would therefore be substantially stronger if it compared against structured baselines that operate natively in feature space, such as conditional VAEs, tabular diffusion or GAN models, or direct constrained optimization. A related limitation is that the empirical validation remains largely offline and proxy based, because the main outcomes are defined through bootstrap random forest risk tools and derived metrics such as reliability probability and EtC, leaving open whether the generated CRMs would improve clinician judgment, patient trust, or real treatment decisions in practice.

Referee: 3
Limitations: The paper lacks a clear, focused research question. it simultaneously suggests to address trust, personalization, fairness, hallucination, AI safety, and distributional shift without a coherent throughline, The exposition is convoluted, making it difficult to identify what problem is actually being solved. If you are studyign trust and fairness, I did not see they were formally measured or operationalized. The methodology leans heavily toward ML/AI systems rather than healthcare operations management. From that standpoint, fit (not the rigor of the reearch) is an issue for SIG.

Referee: 1

Comments to the Author
(There are no comments.)

Referee: 2

Comments to the Author
(There are no comments.)

Referee: 3

Comments to the Author
(There are no comments.)