Current status of TLC multi-core scaling

46 views
Skip to first unread message

Ognjen Maric

unread,
Oct 15, 2021, 11:37:23 AM10/15/21
to tlaplus
I'm using TLC to analyze a spec on a machine with 64 cores, and I'm seeing significantly diminishing returns when adding new cores (a few percent speed-up going from 10 to 60 workers), and I'd like to know what the state of the art options for scaling TLC (obviously, I will profile my model).

Searching the mailing list archives, I found a discussion from a couple of years ago [1] where it was suggested to use OffHeapDiskFPSet, but the state queue implementation was still mentioned as a bottleneck. The 1.6.0 release notes also mentioned ByteArrayQueue, switching to which yielded a 30% speedup with many cores. Is using these two the best I can do today? Is there anything else on the horizon that might improve scaling.

Also, have you considered documenting the options to switch the fingerprint/queue implementations in current-tools.pdf? They were not easy to find otherwise. I'm happy to send a PR...

Best,
Ognjen

[1]: https://groups.google.com/g/tlaplus/c/D96Cop_EcN0/m/fh7pNcswBQAJ from a

Markus Kuppe

unread,
Oct 18, 2021, 4:26:28 PM10/18/21
to tla...@googlegroups.com
Hi Ognjen,

it is great that ByteArrayQueue combined with OffHeapDiskFPSet improves
scaling for you. This combination is the best you can do as of today,
and it would be greatly appreciated if you could create a PR that
updates current-tools.md [1].

As to future work on scaling TLC, a design of a scalable state queue
exist, but I halted working on it in favor of other features. If a
volunteer would like to pick up where I left off, I'm more than happy to
advise.

Thanks,
Markus

[1]
https://github.com/tlaplus/tlaplus/blob/master/general/docs/current-tools.md
Reply all
Reply to author
Forward
0 new messages