Hi all,
I have some questions. This problem has already perplexed me for a long time.
Firstly, i wanted to increase network injection rate, so i had modified the trace generator to make every trace contain ld/st instructions(t_info->has_st: 1 or t_info->num_ld>0), and modified the trace_read_cpu.cc. The modification is as the following:
Comment out codes in the function -- inst_info_s* cpu_decoder_c::convert_pinuop_to_t_uop(void *trace_info, trace_uop_s **trace_uop, int core_id, int sim_thread_id)
(1) the code block which begins at line 444: //Add one more uop when temporary register is required.
(2) the code block which begins at 490: // Instruction has a branch operation
(3) the code block which begins at 512: //Fence instruction: m_opcode == MISC && actually_taken == 1
I want to confirm if my modification is reasonable.
Secondly, my ultimate goal is to set up one experiment scene, in which the smt2/4 cpu will frequently inject packet to the network, and in the one same cycle, the different hardware threads will inject packets with different destination addresses(this means these packets will go to the different L3 caches).
I had monitored the network injection interface of cpus, and found that when one cpu injects some packets into the network, it will wait for many cycles to inject other packets. In view of the phenomenon, i want to know what the reason is. msched_large_rate? rob_large_size? or something else? Because i had changed the values of msched_large_rate and rob_large_size, it didn't work. What can i do to solve it ?
And because i want to make packets go to different l3 caches, i have collect the traces which have different ld_vaddr and st_vaddr. I guess i need to know the address translation mechanism of macsim? or something else?
Time is limited, i need your help and advice. Thanks for your time! And please forgive me for my broken english. Thanks for your help again.
Best regards,
Applee