TIA, Phil
---------
For example, if I use conventional wisdom from Pentium and Triton
chipset datasheets, then:
L2 256KB SRAM cache on mobo ==> 64MB max cacheable DRAM
L2 512KB SRAM cache on mobo ==> 128MB max cacheable DRAM
L2 512KB SRAM + tag (HX chipset) ==> 512MB max cacheable DRAM, Meuller
p192
L2 1024KB (1MB) SRAM cache on mobo ==> 256MB max cacheable DRAM,
Meuller p197
---------
Now take a more complicated scenario used with the popular Socket 7
mobos:
AMD K6-3/400 has 256K L2 cache on-chip +
VIA Apollo MVP3 chipset + 1MB cache (L3) mobo ==> ???MB max cacheable
DRAM
According to Mueller, a lot of this depends on the cache controller
design and availability of tag SRAM to index / address the main
memory. So then L3 efficiency and max cacheable main memory would
depend upon the northbridge VT82C598AT cache controller.
Since the Windows OS loads from the top of main memory, it appears
that one should not install more DRAM main memory than can be cached
by the L2 or L3 cache, as DRAM is 3-10 times slower than SRAM. Thus
specsheets saying that Pentium can address 4GB of RAM while true, has
marginal usefulness for modern OSs.
--------
That said, since Pentium II-IV series are based on the PentiumPro P6
core designs, then how does:
Pentium II / 400MHz / 512KB cache 4GB of main memory (enhanced tag
SRAM, Meuller p143-5), while
Pentium II / 266MHz / 512KB can only cache 512MB of main memory?
Pentium III / 1.4GHz / 512KB can cache ??GB of main memory (spec
24965704.pdf , says Advanced Transfer Cache, 256-bit quad word wide,
on-chip speed)
--------
And finally, since the PPro 1MB cache chip was introduced on 8/97 and
the PII/400 six months later on 4/98, does PPro 1MB have the extra tag
RAM cache designed in (but not described on its spec sheet
24357001.pdf) so that it can cache 1GB of main memory? According to
Mueller p200, this is the max that the 450GX server chipset can cache.
---------
Can anybody shed some light on the advances of cache architecture and
how it relates to the maximum cacheable main memory? Both processor
and chipset spec sheets are very vague about this.
Has anybody already benchmarked this and compared the CPU families.?
Ref1: Scott Mueller, Upgr & Repair PCs, Que, 98, 0-7897-1636-4,
1531ppg + 2CD, $54.99
Regards, Phil
---------
Hi, how does the maximum cacheable main memory differ using Pentium
Pro 256K, 512K, and 1MB L2 cache CPUs??
TIA, Phil
---------
Hi, here are the results of continued research and analysis on the
above subject. Two additional excellent resources were found on the
web. The best explanation of cache operations was described in AMD64
Technology manuals (Ref1, Opteron chip) as it was designed to also
remain backwards binary compatible to current Windows 32-bit OS and
programs.
Its explanation shows cache operations in detail, where the modern
processor's cache controller has cache and main memory accessed using
tag addresses that are included in the cache line. As I see it, this
means that it's the cache controller's job to organize the available
cache memory into a N-way set-associative L2 cache. So this means that
the 256K, 512K, or 1MB cache that is on-chip is a reference value
(actual is larger than spec).
For a little bit of history, this also explains why the Pentium i430HX
chipset (most advanced of the Triton family (Ref2)) required extra L2
Tag-RAM SRAM chips to be added on the mobo, or on the larger 512KB
COAST module, if the L2 cache was to be increased from an 8-bit tag to
a 12-bit tag capability and thus increase support from 64MB to 512MB
of cacheable main memory. Thus Pentium systems had L2 cacheability
determined by the Systems (mobo) Designer depending on his choice of
the Northbridge chipset manufacturer. Pentium Pro and later processors
do not; instead Intel designs the L2 cache controller's functionality
(Ref3).
The L2 cache memory is divided into fixed sized blocks called cache
lines, part of which comprises the cacheable addresses (Ref1, p193) of
main memory. These cache lines are described in Fig 7.3 Cache
Organization Example (Ref1, p208), which shows the tag addresses for
the data in that set of cache lines. The number of tag bits used for
addressing determines the maximum cacheable main memory. The cache
controller design divides up the SRAM into "sets" as the number of
rows of cache lines and "ways" is the number of columns that has the
data in each set.
The L2 cache controller gets the main memory size, L2 cache size, and
Northbridge chipset's memory controller functionality from the BIOS
during boot as defined by the Systems Designer. For example, Intel's
450GX chipset is used in many legacy 4-way Pentium Pro based servers.
Continuing with this example of a Pentium Pro 512KB chip (Ref5), its
4-way set-associative cache has a 32-byte cache line where there are 4
columns of tag+data+other per dataset. And each set has 4 cache lines.
With large main memory installed, such as 1GB which is the largest
that i450GX server chipset supports as cacheable (Ref5b), 16 bytes of
tag RAM is necessary to address it which leaves just 16-bytes of
data+other left in each way of the cache line. Then the total cache
memory size is divided up in 4,016 sets of 16-bytes of data that can
be cached.
----------
After examining the PPro cache architecture in detail, I can see that
it is not too efficient with its 50% tag address overhead. I can also
understand why later processors enlarged their cache line width and
n-way set-associative cache architecture. Additionally, modern L2
cache designs further aggregate the sets into sectors and use a wider
bus to the processor core (Ref 5c-5d). The table below summarizes the
L2 cache technology and shows its evolution to date.
Maximum Cache L2
Cacheable line, cache, L2 cache, L2
Main memory Bytes way KB speed
Pentium (HX) 64, 512MB 32 1 256K, 512K 66MHz
Pentium Pro 1GB 32 4 up to 1MB core
Pentium II 512M 32 4 512K half core
Pentium II 4GB (>=350MHz) 32 4 512K + tag half core
Pentium III 4GB (<=600MHz) 32 8 512K (discrete) core
Pentium III 4GB (>=600MHz) 32 8 256K (ATC) core
Pentium III 64GB(>=1.1GHz) 32 8 512K (ATC) core
Pentium 4 64?GB 64 8 256K (Willamette)core
Pentium 4 64?GB 64 8 512K (Northwood)core
P2 Xeon 4GB 32 8 up to 2MB half core
P3 Xeon 64GB 32 8 up to 2MB core
K6-III-P 256MB 32 4 256K, L3=1MB core
Athlon XP 4GB 64 16 256K/512K Bartoncore
Athlon MP 4GB 64 16 256K/512K(2.8G+)core
Opteron 4GB (legacy) 128 16 1M core
The AMD K6-3+ (Ref6) also talks about their TriLevel cache, with 256K
L2 on-die and using a L3 cache on a 100MHz front-side bus motherboard.
VIA's Apollo MVP3 Northbridge supports 1MB of L3 cache, which makes a
maximum of 256MB main memory cacheable.
My conclusion is that all Pentium Pro processors have an on-chip L2
cache controller that makes the available L2 SRAM memory have fully
cacheable access to all of main DRAM memory within the limits of the
memory controller chipset. Its because the tag RAM overhead of
16-bits, a 50% overhead, that makes it not too efficient. So it
behooves the SysE to use the PPro 1MB chip if possible for better
cache hit performance with double the cache lines available. Also the
larger cache can address data in larger contiguous blocks. Bigger L2
cache is better. Of course, increased L2 cache size yields diminishing
rates of increased cache hit performance, yet a PPro 1MB is still
especially useful with large blocks of data common with database
transactions that servers do.
Too bad it took 10 years before the documentation caught up with the
technology ;-) Remember that the maximum cacheable main memory is
always much less than the maximum addressable memory for a CPU, ie the
incredibly high Pentium's 4GB and Pentium Pro's 64GB. It is also
downward limited by the mobo's chipset memory controller. But most
significantly, having more main memory installed than is cacheable
means that your OS will run at main memory speeds, not at front-side
system bus speeds; ie for Pentium and Pentium Pro it is 16MHz for FPM
DRAM SIMMs and DIMMs vs 66MHz chipset speeds, or one quarter the
speed. What a waste!!
The Windows OS resides at the top of main memory. So if it is not
cached, the OS will run at slow DRAM speeds. So if there are any
Pentium Pro based servers out there, IBM Netfinity 7000 (Type 8651),
HP NetServer LX Pro, Dell PowerEdge 6100, and Compaq Proliant 5000,
SysE's running with 4GB of main memory, take out the 3GB that is not
cacheable for much better Win OS performance. Just because they sell
these options doesn’t make it right that you should use it ;-)
This L2 cache discussion also translates upwards to Pentium II,
Pentium III, Pentium 4, and Xeon based computers too, as they are
still part of the P6, Pentium Pro family of CPUs based upon IA-32
technology (Ref3). Also recall that Windows 9X or NT / 2K 32-bit OS
limits you to 4GB of main memory (Ref5e), so the 64GB cacheability of
Pentium III is a fantasy. The double word 36-bit addressing of Pentium
II and higher processors gives the 64GB addressability and
cacheability. However, you must wait for MS Win XP-64 or use a 64-bit
OS (Sun Solaris and Linix) for anything larger than 4GB of main
memory.
1. "AMD64 Architecture Programmers Manual Vol 2 Systems Programming,"
AMD Doc# 24593.pdf , April 2003, 458pgs; Section 7 Memory System,
p191-239; Section 7.5 Memory caches, p206-15; Section 7.6 Memory Type
Range Registers, p215-27; Section 7.8 Memory-Mapped I/O, p232-9.
2. Intel 430HX PCISet, Intel doc# 290551-002, 29055102.pdf, Apl 1997,
65pgs, Section 4.3 Secondary Cache Interface, p34-7, tag size is 16K
by 11-bits for 512KB cache and 512MB cacheability of main memory.
3. IA-32 Intel Architecture Software Developer's Manual, Vol 3: System
Programming Guide, Intel Doc# 245472-012, 2003, 798 pgs. Chap 10
Memory Cache Control p10-1-43; Table 10-1 Characteristics of the
Caches, TLBs, Store Buffers, and Write Combining Buffer in IA-32
processors, p10-2
4. "AMD K6-3-P Processor Datasheet" AMD Pub 22655.pdf, Sept 1999,
91pgs. Section 2.3 Cache, Instruction Prefetch, and Predecode Bits,
p9-10. This section has a Fig 2 Cache Sector Organization, which shows
the tag SRAM address on the cache data sector using two cache lines.
5. Pentium Pro Family Developer's Manual V3: Operating System Writer's
Guide, Intel doc# 242692, 24269201.pdf , Dec 1995, 458pgs. Chap 11
Memory Cache Control, p11-1-28.
5a. Pentium II Proc at 350, 400, 450MHz Datasheet, Intel doc# 243657,
24365703.pdf , Aug 1998, 84pgs. Cacheable memory up to 4GB is on p7.
Discuss TagRAM in L2 cache. Fig1 shows differences between PPro and
PII. Key difference is TagRAM stepping for T6P-e/A0 , as specified in
update doc# 243337.
5b. Scott Mueller, Repair and Upgrading PCs, 10th Ed, Que, 1998,
0-7897-1636-4, 1531pgs, $54.99. 450KX/GX chipset data is in Table 4.6,
p200.
5c. Intel Pentium III Processor with 512KB L2 Cache at 1.13GHz to
1.4GHz datasheet, Intel doc# 249657, 24965704.pdf, June 2002, 99pgs.
On p1, a L2 512KB Advanced Transfer Cache (ATC) provides for 64GB of
addressable cacheable memory space.
5d. Intel Pentium III Processor for the SC242 at 450MHz to 1GHz
datasheet, Intel doc# 244452, 24445209.pdf , July 2002, 101pgs.
Cacheable memory space is 4GB. On p9, various L2 cache implementations
are discussed with a) commercial discrete TagRAM and SRAM are
integrated into a Slot 1 package or b) integrated on-die and called
Advanced Transfer Cache (ATC), using a 256-bit wide bus to the
processor core. Also see p11 which has Table1 detailing which version
of processor have what type of L2 cache topology.
5e. Jan Ozer etal, "Towers of Power," PC Magazine, V21N16, Sept 17,
2002, p106-21. Reviews of a dozen P4/2.4 and Dual Xeon 2.4 platforms.
On p118, says that Xeon can only address 4GB main memory maximum with
Win2K OS. Benchmarks show that a dual Xeon has about 1/3 speed
advantage over a single CPU of similar type. Looses about 2/3rds of
2nd processor due to HW and OS overhead.
6. AMD K6-III+ Processor Data Sheet, AMD Pub# 23535, 23535.pdf, May
2000, 311 pgs. Section 12.5 L2 Cache and Tag Array Testing (p251-5).
=============
silico...@yahoo.com (Phil Lee) wrote in message news:<5bf61fc8.03062...@posting.google.com>...