Will there be a lesson learned on AWS 2025 East-1 Outage due to DynamoDB with regards to TLA+?

42 views
Skip to first unread message

Chris Ortiz

unread,
Oct 27, 2025, 11:50:04 AM (4 days ago) Oct 27
to tlaplus
Some people say it is a race condition, but as I understand TLA+ is used in DynamoDB and can catch behavior such as this thatcould have been prevented. 

Thanks,
Zitro

Andrew Helwer

unread,
Oct 29, 2025, 2:10:02 PM (2 days ago) Oct 29
to tla...@googlegroups.com
Depends on whether someone who works at Amazon decides to write a TLA+ spec reproducing the bug or not; I don't think anybody outside the company has enough detail to do it. Amazon seems to have largely shifted from TLA+ to P as championed by Ankush Desai, but Ankush recently left Amazon so who knows what the future of Amazon formal methods use looks like. The last TLA+ conference was associated with ETAPS, and one of the industry track ETAPS talks was from an AWS employee where they talked about using a new Rust-based model-checker called Kani.

Andrew

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tlaplus/ac3f4438-8cf7-41e8-9c19-2f42eb01d9fdn%40googlegroups.com.

Felipe Oliveira Carvalho

unread,
Oct 29, 2025, 2:31:53 PM (2 days ago) Oct 29
to tla...@googlegroups.com
"TLA+ is used in DynamoDB" doesn't mean everything is checked with TLA+. And the race condition was in the DNS service, so someone would have to think about the consequences of race conditions in a low-concurrency database of DNS records. You wouldn't even need TLA+ for that. For so long, checking and then updating DNS records non-atomically has been OK that no one thought it would be an issue.

--
Felipe

Markus Kuppe

unread,
Oct 29, 2025, 8:58:53 PM (2 days ago) Oct 29
to tla...@googlegroups.com
Amazon’s 2022 paper, “Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service” [1], provides some insight into which parts of DynamoDB have been specified in TLA+.

In Section 5.4, the authors note:

“The core replication protocol was specified using TLA+ [12, 13]. When new features that affect the replication protocol are added, they are incorporated into the specification and model checked. Model checking has allowed us to catch subtle bugs that could have led to durability and correctness issues before the code went into production.”

Markus

https://www.amazon.science/publications/amazon-dynamodb-a-scalable-predictably-performant-and-fully-managed-nosql-database-service
> To view this discussion visit https://groups.google.com/d/msgid/tlaplus/CAOC8YXarDkG25uPaiBeet%2BK9v_xOFs_AVnSt2W7D%2BH3D80iZdA%40mail.gmail.com.

Reply all
Reply to author
Forward
0 new messages