IceNIC: rate limiter and packet arbitration

105 views
Skip to first unread message

Varun Gandhi

unread,
Sep 14, 2020, 3:11:39 PM9/14/20
to chip...@googlegroups.com
Hi,

I’m trying to emulate virtual NICs on top of IceNIC. 

1. Can a config have two separate NICs (similar to having 2 BOOM/Rocket cores)?
2. Is there a way to perform some kind of bus arbitration to allocate different bandwidths to the NICs?

I’m trying to understand the rate limiter in IceNIC with the help of the unit test in limiter.scala as reference —
limiter.io.settings.inc := 1.U
limiter.io.settings.period := 3.U
limiter.io.settings.size := 2.U
This implies that w.r.t the bandwidth equation (Y = X * N/D)  Y = X * 0.5

3. What exactly is the role of field size? How does it relate to beats (beatBytes has been set to 8 - assuming max val 256?)

Also, NicController (NIC.scala) defines a limiter object but doesn’t initialize inc, period and size fields.

val limiter = Module(new RateLimiter(new StreamChannel(NET_IF_WIDTH)))
limiter.io.in <> unlimitedOut
limiter.io.settings := io.rlimit
io.out <> limiter.io.out
Subsequently, in the Loopback object
netio.rlimit.inc := PlusArg("rlimit-inc", 1)
netio.rlimit.period := PlusArg("rlimit-period", 1)
netio.rlimit.size := PlusArg("rlimit-size", 8)

4. Why exactly have these values been chosen?
5. I couldn’t find anything in TileLink documentation on bus arbitration…
6. Does packet arbitration affect the net bandwidth in any way?

Best,
Varun











Howard Mao

unread,
Sep 15, 2020, 5:01:37 PM9/15/20
to Chipyard
1. This is theoretically possible, but you'll have to create your own top-level trait that instantiates multiple NICs.
2. Do you mean arbitration for the TileLink DMA ports or for the outgoing network interface? 
3. The rate limiter uses a token bucket algorithm. The token count is incremented by "inc" every period+1 cycles until it reaches a saturation point and decremented by 1 every time a flit is sent. The NIC can continue sending flits so long as the token count isn't zero. The "size" parameter just determines the saturation point of the counter. So the rate-limited bandwidth (in flits/cycle) would be inc/(period + 1). 
4. These are somewhat arbitrary, but it gives the loopback NIC a bandwidth of 0.5 flits/cycle.
5. Bus arbitration in TileLink is performed by the TLXbar class. You can change the arbitration policy using the "policy" argument, which defaults to round robin. The only other builtin policies are lowest-index-first and highest-index-first, but you should be able to create your own. Take a look at the definition of the builtin policies in https://github.com/chipsalliance/rocket-chip/blob/master/src/main/scala/tilelink/Arbiter.scala#L12
6. If only one of the interfaces being arbitrated is sending, then packet arbitration shouldn't affect the bandwidth.

--
You received this message because you are subscribed to the Google Groups "Chipyard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chipyard+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chipyard/BAF31F1E-6B0F-4D64-9687-013C4F29A897%40g.harvard.edu.

Varun Gandhi

unread,
Sep 24, 2020, 1:14:31 PM9/24/20
to chip...@googlegroups.com
Thanks Howard!

A few follow-up questions:

1. How does the MMIO node perform arbitration across 2 or more CPUs? Presumably FIRRTL will have to replicate the MMIO node (one for each core). Similarly, the int node will have need to be replicated to issue interrupts based on the intmask state of every CPU. 

I don’t see any arbitration logic in the IceNicController to address this aspect. Would appreciate some pointers!

Varun

Howard Mao

unread,
Sep 27, 2020, 12:29:23 PM9/27/20
to Chipyard
The arbitration is handled at higher levels. The TileLink system bus arbitrates between the two CPUs for access to the device's MMIO, and the PLIC ensures that an interrupt is only delivered to one core. However, since the MMIO requests are only atomic within a single request, the software must use locks or some other kind of synchronization to ensure that two cores share access to the MMIO safely.

Varun Gandhi

unread,
Oct 4, 2020, 9:20:41 PM10/4/20
to chip...@googlegroups.com
Thanks, Howard! I appreciate it.

A few more follow-up questions on the BusToplogy in LoopbackNICRocketConfig:

1. The loopback config has coherentBusTopology, which builds on top of the HierarchicalBusTopology in BaseSubSystemConfig — and adds a peripheral, control, and front bus for MMIO but then the config also removes any MMIO ports by adding WithNoMMIOPort and slave ports using WithNoSlavePorts. What’s the rationale behind this? 

Note: Similarly, it also removes any external interrupts -  WithNExtTopInterrupts(0).  While we know that the NIC can trigger two interrupts, how does this work?

2. From what I’ve understood so far, the arbitration in TLXBar is effectively done by tracking the ReadyValidCancel interface of all the masters (sources), and the arbiter keeps track of earlyValids and validQuals. 
Again, what’s the rationale behind picking earlyWinner and winnerQual separately, instead of just having a simple ReadyValid interface? Is it a performance optimization or critical to the core logic of the round-robin policy?  

3. In terms of figuring out the inter-bus connects — in BusTopology.scala the three optional buses are connected in the following way:

(SBUS, CBUS, TLBusWrapperConnection.crossTo(xTypes.sbusToCbusXType)),
(CBUS, PBUS, TLBusWrapperConnection.crossTo(xTypes.cbusToPbusXType)),
(FBUS, SBUS, TLBusWrapperConnection.crossFrom(xTypes.fbusToSbusXType)))

3.1 What exactly is the role of the control bus in terms of acting as an intermediary between the system bus and peripheral bus? Also, what is the role of the fbus in MMIO and in the above config is it the master in the fbus to sbus connection?

3.2  I don’t precisely understand the role of xType. The code suggests that a TL clock crossing adapter between the buses, but if we look at the definitions, it seems like only a buffer depth varies between NoCrossing and SynchronousCrossing. And, what’s the role of the buffer here?

sbusToCbusXType: ClockCrossingType = NoCrossing,   //  buffer depth = 0
   cbusToPbusXType: ClockCrossingType = SynchronousCrossing(), // buffer depth = 2
   fbusToSbusXType: ClockCrossingType = SynchronousCrossing() // buffer depth = 2

dipolomacy/parameters.scala

case object NoCrossing // converts to SynchronousCrossing(BufferParams.none) via implicit def in package

case class SynchronousCrossing(params: BufferParams = BufferParams.default) extends ClockCrossingType

val default = BufferParams(2)
val none    = BufferParams(0)

4. Where exactly does the NIC connect to the peripheral bus and interrupt bus? In NIC.scala we can see:

   control.node := TLAtomicAutomata() := mmionode

But how does this control.node further connect to the PBUS?


Best,
Varun
   

Howard Mao

unread,
Oct 9, 2020, 6:48:18 PM10/9/20
to Chipyard
Hi Varun,

I'll try to answer your questions as well as I can, but there are a few that are outside my expertise.

1. NoMMIOPort, NoSlavePorts, and NExtTopInterrupts refers to mmio ports, slave ports, and interrupts external to the SoC. By default, there could be AXI interfaces or interrupt lines exposed as top-level ports of the Chisel design. This allows them to be connected to third-party Verilog IP. Disabling these external ports does not affect the internal TileLink interfaces and interrupts from the NIC and other peripherals.
2. I unfortunately don't know much about the internals of the TLXbar, but I assume these are done for ASIC performance reasons.
3.1. The cbus connects directly to the core peripherals (CLINT, PLIC, BootROM, and Debug Module). The pbus connects to lower priority peripherals. The fbus is for DMA peripherals to connect to in order to read and write from the coherent memory system.
3.2. I believe the buffer in the ClockCrossing type refers to the depth of a clock-crossing buffer. I'm not certain about this, however.
4. The connections to the pbus and interrupt bus are in CanHavePeripheryIceNIC. https://github.com/firesim/icenet/blob/master/src/main/scala/NIC.scala#L483

Hope this helps.

-- Howie

Varun Gandhi

unread,
Oct 22, 2020, 4:56:48 PM10/22/20
to chip...@googlegroups.com
Thanks Howard, that was really helpful!

A few more follow-ups as I try to figure out IceNIC

1. How does the offset arithmetic work for regmap? e.g., 

0x00 -> Seq(RegField.w(NET_IF_WIDTH, sendReqQueue.io.enq)),
0x08 -> Seq(RegField.w(NET_IF_WIDTH, recvReqQueue.io.enq)),
0x10 -> Seq(RegField.r(1, sendCompRead)),
Given, NET_IF_WIDTH is 64-bits/8 bytes; shouldn’t the offset addr for sendCompRead start from 0xF? Or are the offsets completely independent of the width of the registers?

2. Given that the external interrupts in the default IceNIC configs have been set to 0; will the PLIC still be involved in the interrupts sent by the NIC?

3. In a multi-core setting, does the PBUS carry forward a core_id to help the NIC identify which core has issued an IO request? If not, is there a way to do that? I don’t see anything in PeripheryBusParams, SystemBusParams or TLBusParams class to access this kind of meta-data.

4. Similarly, in the case of interrupts sent by IceNIC i.e., 

interrupts(0) := sendCompValid && intMask(0)
interrupts(1) := recvCompQueue.io.deq.valid && intMask(1)
Does the intNode in IceNIC controller generate and forward a core_id to issue the interrupt to the correct core? If not, where exactly is this meta-data being generated and how is it being sent forward?
 

Best,
Varun


Howard Mao

unread,
Oct 25, 2020, 11:44:27 PM10/25/20
to Chipyard
1. The offsets are byte offsets.
2. All interrupts except for core-local interrupts (software and timer interrupts) go through the PLIC. This includes both external interrupts and internal interrupts from devices.
3. No, TileLink does not carry a core ID. The information about which Tile sent a request is embedded in the source bits that are used for routing. 
4. No, there is no core ID associated with an interrupt. The interrupt line goes to the PLIC, which chooses which core gets the interrupt based on how the interrupts are assigned in its internal registers.

Reply all
Reply to author
Forward
0 new messages