Hi Rahul,
I would like to revisit this issue as we see all-reduce has support for tuple but reduce-scatter/all-gather doesn't have, so token management is inconsistent between these CC-ops. For example, we see the following in torch-xla all-reduce where we were able to pass the token inside the all-reduce op:
all-reduce.82 = (bf16[1,2048,8192]{2,1,0}, bf16[]) all-reduce(bf16[1,2048,8192]{2,1,0} %multiply.74, bf16[] %p7.66), replica_groups={{0,1,2,3,4,5,6,7},{8,9,10,11,12,13,14,15},{16,17,18,19,20,21,22,23},{24,25,26,27,28,29,30,31}}, to_apply=%AddComputation.78)
%all-gather.135 = bf16[2048,1,8192]{2,1,0} all-gather(bf16[256,1,8192]{2,1,0} %add.130), channel_id=1, replica_groups={{0,1,2,3,4,5,6,7},{8,9,10,11,12,13,14,15},{16,17,18,19,20,21,22,23},{24,25,26,27,28,29,30,31}}, dimensions={0})
Will you be able to reopen this case?
Jeff