Message from discussion
AMD Bulldozer optimization guide
Received: by 10.68.196.130 with SMTP id im2mr1611613pbc.3.1326479050570;
Fri, 13 Jan 2012 10:24:10 -0800 (PST)
Path: lh20ni178270pbb.0!nntp.google.com!news2.google.com!news1.google.com!Xl.tags.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!local2.nntp.dca.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Fri, 13 Jan 2012 12:24:09 -0600
Message-ID: <4F1076C8.5010907@SPAM.comp-arch.net>
Date: Fri, 13 Jan 2012 10:24:08 -0800
From: "Andy (Super) Glew" <a...@SPAM.comp-arch.net>
Reply-To: a...@SPAM.comp-arch.net
Organization: comp-arch.net
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: AMD Bulldozer optimization guide
References: <jekjk2$1v5$1@USTR-NEWS.TR.UNISYS.COM> <4F0FDFC5.6030406@SPAM.comp-arch.net> <51731555-10dc-4cff-a79f-b0d4f73812d1@h13g2000vbn.googlegroups.com>
In-Reply-To: <51731555-10dc-4cff-a79f-b0d4f73812d1@h13g2000vbn.googlegroups.com>
Lines: 50
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-6QR7sWjGnq6ufZEPe/DcR4t+HxZzbNbuY43Ko2uzE7CCJ7MEqW0roopUFk/H5PfPqf4vW1J5U5460w1!SbIKt3k22a0cHuZRlNBJ3d8deNMcHmSi5CzqKX4nbWvLuIkGkDMuCGnVBtO2njU=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Original-Bytes: 3675
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
On 1/13/2012 5:26 AM, Paul A. Clayton wrote:
> On Jan 13, 2:39 am, "Andy (Super) Glew"<a...@SPAM.comp-arch.net>
> [snip]
>> p. 105 prefetching into unmapped pages can result in a significant delay.
>>
>> (Hmm, I think this means that the AMD prefetcher can prefetch across 4KB
>> boundaries. Does Intel do this yet? I.e. it operates ob virtual, not
>> physical, addresses.)
>>
>> I suspect this means that invalid pages are NOT placed into the TLB.
>>
>> If it is true that invalid addresses are not placed into the TLB, then
>> every prefetch to the same may page may produce a TLB miss.
>>
>> GLEW OPINION: you need to cache invalid TLB entries, to constrain
>> prefetch and other speculation. You may want to limit the number of
>> invalid TLB entries, so as to prevent invalids thrashing out valids.
>> Other schemes for constraining ...
>
> In the case of next page within a cache block of PTEs, the
> TLB entry could two bits to indicate if previous and
> subsequent pages are valid (just supporting subsequent
> page information might be adequate).
I think AMD said that the largest stride they predict cannot be more
than a page, so this *might* work - but it would require the prefetcher
to look up both the address to be prefetched and the real-fetch that
caused the prefetch
Methinks it just simpler to cache invalid TLB entries.
Heck, if you are loading a cache line of TLB entries from the page
tables - e.g. 8 at a time - and not storing them individually, but
having a larger TLB storage block that holds all 8 - then you are
already storing invalid TLB entries, since there may be only one valid
TLB entry in a block of multiple adjacent TLB entries.
Heck, doing just this would solve many (but not all) problems with
speculative TLB misses to invalid pages. It would automatically limit
the amount of wasted space.
It's just one step more to caching an entire block og invalid TLB
entries. And then rexducing the block size to one.
Here's something fun: merge adjacent TLB entries in a fully associative
multipagesize TLB, but adjusting a bit per but mask. Really easy for
invalid TLB entries. A bit harder for valid, since you would have to
detect that the physical addresses are adjacent. Or, perhaps, different
only in a few lower bits.