Question about late Instruction prefetch in SniperSim

Hanyun Tao

unread,

Nov 1, 2016, 3:58:53 AM11/1/16

to Sniper simulator

Hi, I'm currently doing a research on instruction prefetching and I'm using Snipersim 5.5.

I have read through the source cod in cache_cntrl.cc, trying to understand how does snipersim model instruction prefetch.

According to my knowledge, the instruction prefetch is achieved by calling doprefetch() function with the instruction address and current time stamp. The thing that I don't understand is, how does the snipersim model the impact of late instruction prefetch.

For example, if I tried to call the doprefetch function to prefetch the current address at the same time before accessing the cache. I notice that this cache access will be treated as a cache_hit. However, this prefetch is not early enough so there should be a late prefetch penalty for this I cache access.

My question is how does sniper sim model the late prefetch penalty for instruction cache?

Sincerely,

Wim Heirman

unread,

Nov 1, 2016, 5:19:23 AM11/1/16

to snip...@googlegroups.com

Hi Hanyun Tao,

When a prefetch completes, its completion time is entered into m_master->msh by CacheCntlr::updateCountersr. So when an overlapping core access is made, it sees a cache hit as you say but its completion time will be delayed until the time the prefetch completed. So it should see a higher access latency than a regular L1 hit.

Regards,

Wim

--
--
--
You received this message because you are subscribed to the Google
Groups "Sniper simulator" group.
To post to this group, send email to snip...@googlegroups.com
To unsubscribe from this group, send email to
snipersim+unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/snipersim?hl=en

---
You received this message because you are subscribed to the Google Groups "Sniper simulator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snipersim+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hanyun Tao

unread,

Nov 2, 2016, 1:16:58 PM11/2/16

to Sniper simulator

Hi Wim，Thank you for your clarification.

So if I understand you correctly, a late prefetch will result in a L1 Icache hit with higher latency.

I'm correctly using rob performance model and when I read the code in rot_timer.cc, I notice that only when an instruction is not a L1 hit:

bool iCacheMiss = (uop.getICacheHitWhere() != HitWhere::L1I);

, the Icache latency will be added to the timer.

My concern is, if the mem_result of the late prefetched instruction cache access is L1 hit with high latency, will the performance model add the latency to its timer correctly?

Sincerely,

On Tuesday, November 1, 2016 at 5:19:23 AM UTC-4, Wim Heirman wrote:

Hi Hanyun Tao,

When a prefetch completes, its completion time is entered into m_master->msh by CacheCntlr::updateCountersr. So when an overlapping core access is made, it sees a cache hit as you say but its completion time will be delayed until the time the prefetch completed. So it should see a higher access latency than a regular L1 hit.

Regards,
Wim

On 1 November 2016 at 08:58, Hanyun Tao <hy...@umich.edu> wrote:

Hi, I'm currently doing a research on instruction prefetching and I'm using Snipersim 5.5.
I have read through the source cod in cache_cntrl.cc, trying to understand how does snipersim model instruction prefetch.
According to my knowledge, the instruction prefetch is achieved by calling doprefetch() function with the instruction address and current time stamp. The thing that I don't understand is, how does the snipersim model the impact of late instruction prefetch.
For example, if I tried to call the doprefetch function to prefetch the current address at the same time before accessing the cache. I notice that this cache access will be treated as a cache_hit. However, this prefetch is not early enough so there should be a late prefetch penalty for this I cache access.

My question is how does sniper sim model the late prefetch penalty for instruction cache?

Sincerely,

--
--
--
You received this message because you are subscribed to the Google
Groups "Sniper simulator" group.
To post to this group, send email to snip...@googlegroups.com
To unsubscribe from this group, send email to

snipersim+...@googlegroups.com

For more options, visit this group at
http://groups.google.com/group/snipersim?hl=en

---
You received this message because you are subscribed to the Google Groups "Sniper simulator" group.

To unsubscribe from this group and stop receiving emails from it, send an email to snipersim+...@googlegroups.com.

Wim Heirman

unread,

Nov 3, 2016, 3:35:20 AM11/3/16

to snip...@googlegroups.com

That looks like a bug. (We used to only mostly HPC codes where I-cache misses were very rare, so we never really considered this case). You'll probably want to change this to something that checks the I-cache access latency (in uop.getICacheLatency()), and considers anything that is longer than a hit as a miss.

-Wim

snipersim+unsubscribe@googlegroups.com

For more options, visit this group at
http://groups.google.com/group/snipersim?hl=en

---
You received this message because you are subscribed to the Google Groups "Sniper simulator" group.

To unsubscribe from this group and stop receiving emails from it, send an email to snipersim+unsubscribe@googlegroups.com.

Hanyun Tao

unread,

Nov 3, 2016, 6:39:56 AM11/3/16

to Sniper simulator

Thank you, Wim!

Before starting on fixing it, could you please clarify one more thing for me?

When I looked into the time-stamps of different I cache accesses, I found that multiple cache access may occur at the same time-stamp, even though the first one is a I cache miss.

After some investigation, I learnt that this happened when the performance_model need to fill its pre-ROB buffer.

The code below in rob_timer.cc will stall the simulation until there are enough instructions in the pre-Rob buffer.

// If frontend not stalled

if (frontend_stalled_until <= now)

{

if (rob.size() < m_num_in_rob + 2*dispatchWidth)

{

// We don't have enough instructions to dispatch <dispatchWidth> new ones. Ask for more before doing anything this cycle.

return;

}

When the simulation is stalled, the timer in performance model will be stay unchanged, causing multiple instruction access occur at the same time.

Do I understand the situation correctly? If this is what supposed to happen, how should I handle these concurrent I-cache accesses when the first access is an I cache miss (which basically means that the next access need to wait until the first access return and should not happen at the same time)

Best regards!

Wim Heirman

unread,

Nov 3, 2016, 9:52:57 AM11/3/16

to snip...@googlegroups.com

The I-cache is accessed through core->readInstructionMemory, which is currently done from MicroOpPerformanceModel::handleInstruction. This happens quite a bit before the code in dispatch, and as you say can queue up several instructions, some of them I-cache misses, without allowing the actual timing model (RobTimer) to advance time.

I think it should be safe to move the core->readInstructionMemory call into RobTimer::doDispatch (say around line 460, just before the iCacheMiss test is made), that would make the I-cache access use the timestamp that the instruction actually entered the ROB so it is delayed by misses (and pending hits) that happened just before.

-Wim

snipersim+unsubscribe@googlegroups.com

For more options, visit this group at
http://groups.google.com/group/snipersim?hl=en

---
You received this message because you are subscribed to the Google Groups "Sniper simulator" group.

To unsubscribe from this group and stop receiving emails from it, send an email to snipersim+unsubscribe@googlegroups.com.

Hanyun Tao

unread,

Nov 10, 2016, 6:31:52 PM11/10/16

to Sniper simulator

Hi Wim,

Thank you for your reply! I made the following modification to the code following your suggestion to fix the issue.

I moved the readInstructionMemory() right before the iCacheMiss check and made sure that each instruction will only access Icache once.
For each Icache access, I used the timer now in rob_timers instead of the timer in micro_op_performance_model so it will be delayed by misses and pending hits that happened just before.
When dispatching an instruction, I checked its Icache_latency. If the latency is greater than a normal L1_hit, I treat the instruction as an iCacheMiss and add its latency to the timer.

Do you think the above modification is valid and will fix the issue? And should I, or why shouldn't I, add the Icache_latency of an instruction that hits in L1Icache to the timer?

Sincerely,

Wim Heirman

unread,

Nov 13, 2016, 4:11:31 AM11/13/16

to snip...@googlegroups.com

Hi Hanyun,

Yes those fixes should work.

If you add the L1-I hit latency (4 cycles) to each instruction, you'll only be able to dispatch one instruction every 4 cycles. In reality, the core can usually read more than one instruction from a single I-cache access, and can overlap multiple I-cache accesses, to ensure the dispatch rate (assuming all L1-I hits) is at least the width of the machine (4 instructions / cycle for Nehalem). In Sniper, we don't model that part of the front-end (for most codes it's not the bottleneck) and assume it always is able to fetch enough instructions, except when there are I-cache misses.

Regards,

Wim

snipersim+unsubscribe@googlegroups.com

For more options, visit this group at
http://groups.google.com/group/snipersim?hl=en

---
You received this message because you are subscribed to the Google Groups "Sniper simulator" group.

To unsubscribe from this group and stop receiving emails from it, send an email to snipersim+unsubscribe@googlegroups.com.

Reply all

Reply to author

Forward