Hi all ,
I have been playing around with Slate's PGEQRF routine for a while and have noticed no performance/cost change if I modify the Slate::Option::Lookahead argument. From the statistics I generate, it seems as if the underlying schedule is not changing at all.
For example, using 16 nodes of Stampede2 with 64ppn, so 1024 MPI processes, Slate runs in 5.24 seconds regardless of the argument I pass into Slate::Option::Lookahead. I have tried arguments 0,1,2,4,8,16.
Do you know if the Lookahead feature is disabled (perhaps just on non-GPU machines or something), or if there is some other explanation?
Thanks!
Edward Hutter