Specifying a multimodal AI orchestrator in TLA+ before implementation

Petru Mirzenco

unread,

Apr 19, 2026, 9:42:23 AM (yesterday) Apr 19

to tlaplus

Hi everyone,

We wanted to share a recent production use case applying TLA+ to the domain of AI orchestration. We built a multimodal AI orchestrator by first specifying the architecture, and then writing the code to align with the spec before deploying to Google Cloud.

We built a system that generates multimodal content based on user context and private uploaded images, and conducts related searches across internal databases and the web. This requires coordinating multiple external AI services (Vision, LLM, Image/Video Generation, and Search). The underlying AI models are inherently non-deterministic and suffer from unpredictable latency spikes. If a video generation node times out, or a search API returns a malformed payload, it can trigger asynchronous race conditions, retry loops, and partially complete states where a user is billed for an asset that never delivers.

Standard integration testing couldn’t adequately cover the state space created by these asynchronous intersections.

We modeled the orchestration kernel in TLA+. We didn’t model the AI model logic itself; instead, we considered the AI agents as non-deterministic nodes that could return any value from a set of defined outcomes (e.g., Success, Timeout).

This let us verify exactly how our orchestrator would handle the AI’s unpredictable behavior before the implementation phase even began.

We wrote two separate specifications: one for the backend orchestration kernel (~58,000 states explored) and one for the frontend lifecycle (~30,000 states explored) - to make sure the user journey is consistent across environments.

The model checker identified several critical logic flaws that would not have surfaced until production:

Retry Race Condition: A video generation task was incorrectly being reset to Idle, which immediately triggered a duplicate retry while the original task was still running.
Configuration Boundary Bug: We tested the boundary of a MaxVideoRetries setting. One or two retries were stable, but expanding to three caused a silent failure of the Video Api — the UI loaded successfully, but the final video asset was dropped. The user would have been billed for an undelivered item.
Data Isolation Violation: We caught a sequence where the internal storage identifiers could leak into the external AI model context payload.

After three months deployed on GCP, we’ve had zero logic failures or state-tracking errors. Zero logic failures or state-tracking errors. During one live incident, a misconfigured secondary search service went down — the verified architecture bypassed it and delivered the primary result without system failure.

TLC Checking stats

A privacy invariant snippet

We welcome questions and feedback.

Markus Kuppe

unread,

Apr 19, 2026, 7:39:00 PM (16 hours ago) Apr 19

to tla...@googlegroups.com

Hi Petru,

thanks for sharing your success story. It will hopefully inspire others to publicly share their experiences as well.

The answer is probably no, but is your spec publicly available?

Thanks,
Markus

> TLC Checking stats<tlaproj-states-3retries.png>
>
>
> A privacy invariant snippet<tlaproj-privacy-invariant.png>

Petru Mirzenco

unread,

Apr 19, 2026, 11:10:17 PM (12 hours ago) Apr 19

to tlaplus

Hi Markus,

The spec isn't publicly available at the moment. We are considering sharing more modeling details in future posts.

We also have a technical write-up covering the architecture and the verification approach. Happy to send it directly if you're interested.