I suspect the following congenital features/symptoms will be exhibited in solutions:
-For comparisons using multiple runs on a Constant Input Group Cig_0 (where Cig_0 reuses the same arguments for a sample execution):
Different implementations on the same arguments
1. For a time delta (meaning a timing delta being attributed to algorithmic improvement) will result in inconsistent solutions as measured against the baseline solution. Meaning, many solution varieties will be the general case.
Same implementation on different arguments
2. For comparisons between Cig_0, Cig_1, through Cig_n, timing deltas within an algorithmic implementation will exhibit and be susceptible to branch prediction failure penalties that have a combinatorial characteristic. (given the earliest sequential layer depth at which the failure occurred). This property is caused by the architecture of the algorithm. Moreover, it is not only an accumulation that can result in an incorrect lookup. Given the order of failure positions in a series (or in an individual threading chunk), all work (composition, synchronization, etc.) must be rescheduled and computed anew. I don't know what error detection, error correction scheme is being implemented in T-Mac.
Algorithmic approach
3. More generally for this usecase, lookup tables leads to other degeneracies that cannot be cured.
There are better algorithmic tools to treat the principle question. I don't have experience running this algorithm, maybe you can share where my thinking can be improved.
Akinbo