Hello Mahmoud,
I really thanks to your detailed replied, but let me clarify my concern once again. Maybe GTO we are thinking is little bit different. I have noticed the difference between warp ID and dynamic_warp_id with your well written in-code comment.
I think the problem is dynamic_warp_id only changes when a new warp launched to the scheduler. Let me restate operation of current code with a example.
@ cycle 0 : 32 warps launched to a scheduler, and got dynamic_warp_id from 0 ~ 31.
@ cycle 1~20 : warp0 issue instructions and stalled
@ cycle 21 : scheduler check ready state from warp0 (** notice it is not from warp1**) and warp1 issues instructions.
@ cycle 25 : warp0 becomes ready again. warp1 is still issuing instructions.
@ cycle 30 : warp1 stalls, so warp0 start to issue instruction again (** warp2~ 31 did not get a chance at all **)
Therefore I could observe warps with big dynamic_warp_id tends not to proceed at all until warps with small dynamic_warp_id finishes, even though all warps launched at same cycle.
Additionally, even if warp32 launched at cycle 31 (let's assume warp0 finished), it is unfair warp32 waits until warp1~31 to finish. If warp1~31 issue an instruction and stall at cycle 31~62 respectively, doesn't it have to be warp32's turn to issue an instruction? However in current code if warp1 has become ready again at cycle 63, it becomes warp1's turn. Let me know if I have some miss-understanding about current GPGPU-sim code or GTO scheduling.
As you mentioned that scheduling policy is still in developing, I will let you know as soon as I have find some useful result with real HW validation.
Thanks a lot
Jounghoo Lee