L3 cache mapping on Sandy Bridge CPUs

245 views

Skip to first unread message

Mark Seaborn

unread,

Apr 29, 2015, 8:02:20 PM4/29/15

to rowhammer-discuss

I'm investigating whether it is possible to do row hammering and cause bit flips through normal cached memory accesses, without CLFLUSH.

An important step in doing that is being able to pick addresses that map to the same L3 cache set. For testing purposes, we can do that quickly if we know how physical addresses map to L3 cache sets, and if we know which physical addresses we have access to (e.g. via /proc/PID/pagemap on Linux).

Here is a building block for doing that:

http://lackingrhoticity.blogspot.com/2015/04/l3-cache-mapping-on-sandy-bridge-cpus.html

Cheers,

Mark

Mark Seaborn

unread,

Jun 24, 2015, 8:16:25 PM6/24/15

to rowhammer-discuss

I have an update about that blog post.

There was at least one aspect of the L3 mapping that wasn't correct. The test program didn't detect this because there was a bias in the way it gathered the timing data shown in the graphs.

The bias was as follows: For each iteration, the program tests a set of memory locations, but these locations are likely biased towards being relatively close to each other in physical address space. That's because Linux mmap() calls have a tendency to return chunks of memory that are partially physically contiguous. My test program mmap()s a 16MB chunk of memory each time (a relatively small size), and it didn't attempt to randomise its choice of addresses from that 16MB chunk. It just linearly scans the chunk to find addresses that it believes map to the same L3 cache set (based on its model of the L3 mapping).

I've committed some changes to fix that. (See "git log 1e393c69c7c6a88db22f9b4ee637b3f6e1cfc8d3..12cc6d4bd227dc5ac8d6aeb96835dea20592d27c" in https://github.com/google/rowhammer-test/commits/master/cache_analysis.)

The fix is to randomise the selection of addresses from the mmap()'d chunk.

With randomisation added, it becomes apparent that we need to XOR bit 32 of the physical address into the cache-slice hash function. Otherwise, the access-time graph no longer sharply increases at N=13.

This is mildly interesting because it is a difference from the result in the paper I referenced [1]. That paper says "It turned out that only the bits 31 to 17 are considered as input values". However, they tested a 4-core machine, whereas I've only tested 2-core machines.

Cheers,

Mark

[1] "Practical Timing Side Channel Attacks Against Kernel Space ASLR", Ralf Hund, Carsten Willems and Thorsten Holz, http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf

Reply all

Reply to author

Forward

0 new messages