I was able to reproduce it.
I've also added, just in case, a smaller transition from Prop1 to Prop2.
Also, I added many parentheses - it didn't help. Prop1, Prop1a, and Prop2 succeed, while Prop3 fails.
What's weird is that the coverage statistics are also different, depending on the liveness property checked.
Really weird. Worth opening an issue here:
https://github.com/tlaplus/tlaplusHere are my attempts with various variations, but basically I got the same results as you did:
----------------------------- MODULE testing -----------------------------
VARIABLE s
vars == << s >>
Init == s = "a"
Next == \/ /\ s = "a"
/\ s' = "b"
\/ /\ s = "b"
/\ s' = "c1"
\/ /\ s = "b"
/\ s' = "c2"
\/ /\ s = "c2"
/\ s' = "c1"
\/ /\ s = "c1"
/\ s' = "c2"
\/ /\ s = "c1"
/\ s' = "a"
Spec == Init /\ [][Next]_vars /\ WF_vars(Next)
\* Succeeds
Prop1 == (s = "b") ~> ((s = "a") \/ ([](s \in {"c1", "c2"} /\ <>(s = "c2"))))
\* Should be equivalent to Prop1. Succeeds.
Prop1a == []((s = "b") => (<>((s = "a") \/ [](s \in {"c1", "c2"} /\ <>(s = "c2")))))
\* Should be equivalent to Prop1a. Succeeds.
Prop2 == []((s # "b") \/ <>((s = "a") \/ [](s \in {"c1", "c2"} /\ <>(s = "c2"))))
\* Should be equivalent to Prop1 and Prop2. Fails.
Prop3 == []((s # "b") \/ ~([](~((s = "a") \/ [](s \in {"c1", "c2"} /\ <>(s = "c2"))))))
(*
Error trace for Prop3:
1: Initial predicate, s = "a"
2: Next in TestBug, s = "b"
3: Next in TestBug, s = "c2"
4: Next in TestBug, s = "c1"
4: Back to state (links to s = "c2")
*)
==========================================================================