Understanding Gemmini’s Uncached TileLink Path and L1/L2 Coherence Implications

78 views
Skip to first unread message

Fizza Haq

unread,
Nov 18, 2025, 12:06:46 AM11/18/25
to Chipyard

Hi everyone,

I’m trying to better understand how TileLink behaves in Chipyard specifically when connecting Gemmini as an uncached client.

From my understanding, Gemmini is typically connected as an uncached TileLink client (via TLUL), meaning it does not participate in the cache-coherency protocol used by the CPU tiles. However, I’m still confused about the exact data path and coherence implications:

  1. When Gemmini is configured as an uncached TileLink client, does its traffic simply pass through the L2 and then go directly to main memory (i.e., truly uncached)?

    • Does the L2 still temporarily store or track this data in any way?

    • Or does the L2 effectively act as a pass-through router for uncached traffic with no allocation?

  2. When a normal CPU tile (cached TL client) accesses the same address range that Gemmini writes to, how is coherence maintained?

    • Does the L2 act as the coherence manager and send probes to L1?

    • Or, for Gemmini’s uncached writes, is there no coherence guarantee, meaning software must avoid overlapping regions or insert flush/fence instructions or there is a specific address range for a gemmini to which it writes ?

  3. In general, I’m trying to clarify:
    If Gemmini bypasses L1 (uncached), how exactly does L2 treat those accesses? And for cached CPU traffic, what maintains consistency between L1 and L2 when Gemmini writes to DRAM?

Any clarification on how Gemmini, TileLink, and the L1/L2 coherence model interact plus pointers to relevant RocketChip/Chipyard documentation or code would be greatly appreciated.

Hasan Nazim Genc

unread,
Jan 26, 2026, 6:42:47 PM (9 days ago) Jan 26
to chip...@googlegroups.com
Hi Fizza,

Sorry for the late reply. This is my understanding of how Gemmini interacts with TileLink and Chipyard's outer memory system, although it is possible I am not 100% correct:

1. In Gemmini's default configuration, the L2 does store data that Gemmini accesses, while the L1 is bypassed. In fact, for this reason, Gemmini's performance may suffer for some workloads if there is no L2. It is possible to configure Chipyard+Gemmini so that the L2 is completely bypassed, but I don't believe the default configurations do that.

2. We assume that the CPU fences before accessing anything that Gemmini wrote to L2/DRAM. I am not sure what happens if the CPU tries to read without fencing.

3. IIUC, the L1 is only read-from and written-to by the CPU, while Gemmini gets to access the L2 directly. After the CPU runs `fence`, it should be able to read anything that Gemmini wrote correctly. We never had any issues having Gemmini read anything the CPU wrote, even without fencing, but I don't know what the exact semantics or corner-cases for CPU-to-Gemmini transfers are.

Regards,
Hasan


--
You received this message because you are subscribed to the Google Groups "Chipyard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chipyard+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/chipyard/4847573a-b215-477f-b3e4-134206b0ff4fn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages