Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion AMD Bulldozer optimization guide

Received: by 10.68.196.130 with SMTP id im2mr1611613pbc.3.1326479050570;
        Fri, 13 Jan 2012 10:24:10 -0800 (PST)
Path: lh20ni178270pbb.0!nntp.google.com!news2.google.com!news1.google.com!Xl.tags.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!local2.nntp.dca.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Fri, 13 Jan 2012 12:24:09 -0600
Message-ID: <4F1076C8.5010907@SPAM.comp-arch.net>
Date: Fri, 13 Jan 2012 10:24:08 -0800
From: "Andy (Super) Glew" <a...@SPAM.comp-arch.net>
Reply-To: a...@SPAM.comp-arch.net
Organization: comp-arch.net
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: AMD Bulldozer optimization guide
References: <jekjk2$1v5$1@USTR-NEWS.TR.UNISYS.COM> <4F0FDFC5.6030406@SPAM.comp-arch.net> <51731555-10dc-4cff-a79f-b0d4f73812d1@h13g2000vbn.googlegroups.com>
In-Reply-To: <51731555-10dc-4cff-a79f-b0d4f73812d1@h13g2000vbn.googlegroups.com>
Lines: 50
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-6QR7sWjGnq6ufZEPe/DcR4t+HxZzbNbuY43Ko2uzE7CCJ7MEqW0roopUFk/H5PfPqf4vW1J5U5460w1!SbIKt3k22a0cHuZRlNBJ3d8deNMcHmSi5CzqKX4nbWvLuIkGkDMuCGnVBtO2njU=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Original-Bytes: 3675
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

On 1/13/2012 5:26 AM, Paul A. Clayton wrote:
> On Jan 13, 2:39 am, "Andy (Super) Glew"<a...@SPAM.comp-arch.net>

> [snip]
>> p. 105 prefetching into unmapped pages can result in a significant delay.
>>
>> (Hmm, I think this means that the AMD prefetcher can prefetch across 4KB
>> boundaries.  Does Intel do this yet? I.e. it operates ob virtual, not
>> physical, addresses.)
>>
>> I suspect this means that invalid pages are NOT placed into the TLB.
>>
>> If it is true that invalid addresses are not placed into the TLB, then
>> every prefetch to the same may page may produce a TLB miss.
>>
>> GLEW OPINION: you need to cache invalid TLB entries, to constrain
>> prefetch and other speculation.  You may want to limit the number of
>> invalid TLB entries, so as to prevent invalids thrashing out valids.
>> Other schemes for constraining ...
>
> In the case of next page within a cache block of PTEs, the
> TLB entry could two bits to indicate if previous and
> subsequent pages are valid (just supporting subsequent
> page information might be adequate).

I think AMD said that the largest stride they predict cannot be more 
than a page, so this *might* work - but it would require the prefetcher 
to look up both the address to be prefetched and the real-fetch that 
caused the prefetch

Methinks it just simpler to cache invalid TLB entries.

Heck, if you are loading a cache line of TLB entries from the page 
tables - e.g. 8 at a time - and not storing them individually, but 
having a larger TLB storage block that holds all 8 - then you are 
already storing invalid TLB entries, since there may be only one valid 
TLB entry in a block of multiple adjacent TLB entries.

Heck, doing just this would solve many (but not all) problems with 
speculative TLB misses to invalid pages.  It would automatically limit 
the amount of wasted space.

It's just one step more to caching an entire block og invalid TLB 
entries.  And then rexducing the block size to one.

Here's something fun:  merge adjacent TLB entries in a fully associative 
multipagesize TLB, but adjusting a bit per but mask.  Really easy for 
invalid TLB entries. A bit harder for valid, since you would have to 
detect that the physical addresses are adjacent.  Or, perhaps, different 
only in a few lower bits.