Re: [gem5-gpu-dev] virtual networks in gem5-gpu

291 views
Skip to first unread message

Jason Lowe-Power

unread,
Oct 25, 2016, 6:36:39 PM10/25/16
to lalh...@uci.edu, gem5-gpu Developers List
Hi Lulwah,

The virtual networks are very brittle in Ruby. It often takes me between hours and days to get them right whenever I make a new protocol or make changes to a protocol. 

Even though it says the default is 10 networks, a realistic implementation wouldn't use that many. IIRC, VI_hammer just needs 3 (request, response, unblock). However, because of the way SLICC is written, it's much easier to create unique virtual channels on each side of a cache controller (e.g., 3 incoming and 3 more outgoing) even if it wouldn't be implemented this way. Also, we need to have an extra channel for the "trigger queue" which would not be a virtual channel in the routers.

Unless I'm remembering my NoC lectures wrong, I don't think it's possible to have a single virtual channel and not have deadlocks, so even if you were able to get the system configured correctly, I don't think it would be able to run.

Jason 

On Mon, Oct 24, 2016 at 1:56 PM <lalh...@uci.edu> wrote:
Hello,

I have noticed that the default number of virtual networks in gem5 is 10 and in gem5-gpu you are using 8 virtual networks.

I want to use one virtual network and I changed the *.sm files but I am getting an error and the simulation is aborted:
gem5.opt: build/X86_VI_hammer_GPU/mem/ruby/network/MessageBuffer.cc:213: void MessageBuffer::enqueue(MsgPtr, Cycles): Assertion `m_consumer != __null' failed.
Program aborted at cycle 500

I am not sure if I have changed all the *sm files required or I am missing some file???
These are the files that I changed:
gem5-gpu/gem5-gpu/src/mem/protocol/
                                                                   VI_hammer_GPUL2cache.sm
                                                                   VI_hammer_GPUL1cache.sm
                                                                   VI_hammer_dir.sm
                                                                   VI_hammer_CPUcache.sm
                                                                   VI-ce.sm
gem5-gpu/gem5/src/mem/protocol/MOESI_hammer-dma.sm

Do I need to change all the MOESI_hammer*.sm, or the cpu and the dir use the protocols defined in VI_hammer_CPUcache.sm and VI_hammer_dir.sm?

Regards,
Lulwah


Lulwah A M J Alhubail

unread,
Oct 25, 2016, 7:17:53 PM10/25/16
to Jason Lowe-Power, gem5-gpu Developers List
Hello Jason,
Thanks for the reply and explanations. 

So in your case, the Simple network you are saying that the 8 virtual networks are convertrd into 3 networks depeing on their type, hence each controller will have 3 virtual channels that is one per each virtual network.

The reason that I want one virtual network is that I want to implement garnet network. When I checked the garnet within gem5, I noticed that they use virtual networks (default 10) and virtual channel per virtual network (default 4), so each port of the router will have number of virtual channels = number of virtual networks * number of virtual channels per virtual network (so default 40). This is huge!! And the minimum number of virtual channels in this case will be 8 by setting up the number of virtual channels per virtual network to 1, since in VI_Hammer SLICC files you are defining 8 virtual networks, which is still huge. Because Virtual channels means buffers wich will cost power and area. Hence, I wanted to have one virtual network and vary the number of virtual channels per virtual network. 

I am still trying to figure out how actually it is using these virtual networks ???

So far I found that changing the number of virtual channels in the *sm files of the controllers is not enough, actually alot of files need to be changed and it is time consuming and most likely cause alot of errors that is hard to debug. 

So I decided to study the code more and see if keeping the virtual networks will give me the same effecr I am trying to have. 

Regards,
Lulwah

Jason Lowe-Power

unread,
Oct 26, 2016, 11:36:08 AM10/26/16
to Lulwah A M J Alhubail, gem5-gpu Developers List
Hi Lulwah,

My overarching point is you don't have to simulate exactly the same thing that you are trying to evaluate. For instance, the simulator may have 10 virtual channels, but as long as you know that only 3 are used, than you are really simulating a system with only 3. I can't tell you how to run your simulations, but in many cases, especially with Ruby, you need simulate things that aren't exactly how they are in a "real" system. 

I'm not very familiar with Garnet, so I'm not sure what the interplay between virtual networks in Garnet and virtual channels in the cache controllers. It does seem reasonable that you can have a single network with multiple virtual channels. I would try to stay away from changing the virtual channels in the SLICC code because it's so brittle, though.

Cheers,
Jason

Joel Hestness

unread,
Oct 26, 2016, 12:54:00 PM10/26/16
to Jason Lowe-Power, Lulwah A M J Alhubail, gem5-gpu Developers List
Hi guys,
  It looks like Lulwah's original error is due to message buffers not getting connected in the Python configuration scripts. In a SLICC cache controller file, if you add a message buffer as a virtual network input or output in the machine declaration (e.g. see first 25 lines of src/mem/protocol/VI_hammer-GPUL2cache.sm), you also need to adjust the configuration scripts to connect that message buffer to the interconnect (e.g. see roughly lines 188-205 of configs/gpu_protocol/VI_hammer_fusion.py). If you don't make this connection, Ruby and Garnet interconnects will try to access the message buffers without having the pointers to them (thus, the abort).

  Hope this helps!
  Joel


--
  Joel Hestness
  PhD Candidate, Computer Architecture
  Dept. of Computer Science, University of Wisconsin - Madison
  http://pages.cs.wisc.edu/~hestness/

Lulwah A M J Alhubail

unread,
Oct 26, 2016, 1:22:02 PM10/26/16
to Joel Hestness, Jason Lowe-Power, gem5-gpu Developers List
Hello Joel,
I think you are right there was something wrong with the connection of message buffers when I checked it I found something wrong with setting up the consumer of the buffers. 

But as I said, there so much more that need to be changed, so for now I'll see if I can adapt the current setting of the virtual network to simulate my intended system, if not I would spend the time trying to do the changes. 

Thanks,
Lulwah
Reply all
Reply to author
Forward
0 new messages