Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

About TLB in lower-level caches

22 views

Skip to first unread message

Auhgnist

unread,

Mar 6, 2006, 8:36:36 AM3/6/06

In a CMP processor, for unified cache shared between processor cores,
such as L2 Cache, is there a TLB set associated with each processor
core? If so, when context switch happens on a certain processor core,
it's related TLB set should be flushed?

Dale Morris

unread,

Mar 7, 2006, 5:42:55 PM3/7/06

"Auhgnist" <auhg...@gmail.com> wrote in message
news:1141652196.7...@j33g2000cwa.googlegroups.com...

Yes, typically there is a TLB associated with each logical processor,
independent of whether logical processors might share other resources, such
as caches. In the case of multi-threaded cores, the TLB structures are
usually physically shared, but tagged such that entries are associated with
a particular HW thread.

The question of context switch is really independent of CMP or resource
sharing among logical processors. In the old days, TLBs were typically
flushed on context switch, but today most designs include information as
part of the TLB mapping entry that makes it unique within that OS
environment. For example, the mapping might include some sort of process ID
tag. An example would be Region IDs in Itanium processors. With this
information, there's no need to remove TLB entries on context switch.

- Dale Morris
Itanium processor architect
Hewlett-Packard Co.

Anne & Lynn Wheeler

unread,

Mar 7, 2006, 5:23:23 PM3/7/06

"Dale Morris" <dale....@hp.com> writes:
> The question of context switch is really independent of CMP or
> resource sharing among logical processors. In the old days, TLBs
> were typically flushed on context switch, but today most designs
> include information as part of the TLB mapping entry that makes it
> unique within that OS environment. For example, the mapping might
> include some sort of process ID tag. An example would be Region IDs
> in Itanium processors. With this information, there's no need to
> remove TLB entries on context switch.

virtual memory on 370/165 (early 70s) had a 7-entry sto-stack. tlb
entries had 3-bit sto-identifier ... which mapped to one of the
7-entries in the sto-stack (plus identifier for unused).

virtual address space had a segment table ... the address of which
(aka segment-table-origin) was loaded into segment table register
... and which the hardware used to uniquely identify a virtual address
space.

when segment register was changed, the hardware would check to see of
the address (sto) was already in the TLB sto-stack. if not, one of the
sto-stack entries was replaced and all the TLB-entries associated with
the sto-stack entry were automatically purged.

The lower-end 370s tended to flush their complete TLB on every segment
register switch.

the 165 implementation carried thru the 370/168 and into the 3033
(which started out being 168 wiring diagram mapped to faster chip
technology). one of the people that went on to later be an itanium
architect ... introduced "dual-address" space for the 3033
(instructions and support for directly accessing data in a different
address space). For some operations, this would double the number of
active virtual address space and there were rumors at the time of
significant sto-stack (and therefor tlb-entry) thrashing as a result.

misc. past posts mentioning sto-stack:
http://www.garlic.com/~lynn/94.html#46 Rethinking Virtual Memory
http://www.garlic.com/~lynn/99.html#204 Core (word usage) was anti-equipment etc
http://www.garlic.com/~lynn/2000g.html#10 360/370 instruction cycle time
http://www.garlic.com/~lynn/2001g.html#8 Test and Set (TS) vs Compare and Swap (CS)
http://www.garlic.com/~lynn/2002c.html#40 using >=4GB of memory on a 32-bit processor
http://www.garlic.com/~lynn/2002l.html#60 Handling variable page sizes?
http://www.garlic.com/~lynn/2003e.html#0 Resolved: There Are No Programs With >32 Bits of Text
http://www.garlic.com/~lynn/2003g.html#12 Page Table - per OS/Process
http://www.garlic.com/~lynn/2003g.html#13 Page Table - per OS/Process
http://www.garlic.com/~lynn/2003g.html#23 price ov IBM virtual address box??
http://www.garlic.com/~lynn/2003h.html#37 Does PowerPC 970 has Tagged TLBs (Address Space Identifiers)
http://www.garlic.com/~lynn/2003m.html#29 SR 15,15
http://www.garlic.com/~lynn/2005c.html#63 intel's Vanderpool and virtualization in general
http://www.garlic.com/~lynn/2005h.html#11 Exceptions at basic block boundaries
http://www.garlic.com/~lynn/2005h.html#18 Exceptions at basic block boundaries
http://www.garlic.com/~lynn/2005r.html#44 What ever happened to Tandem and NonStop OS ?

801 was originally designed to be a much more closed, proprietary,
dedicated environment. instead of segment table and segment table
register, it had inverted tables and segment-identifiers.
http://www.garlic.com/~lynn/subtopic.html#801

romp in the early 80s, implemented a 12-bit segment identifier. romp
had 32bit virtual addressing ... the top four bits mapping to one of
sixteen segment registers. each segment register could have an
arbitrary 12bit segment identifier value. To do virtual->real mapping,
the hardware indexed a segment register (using the top four bits of
the virtual address), took the 12-bit segment identifier from the
segment register and then used to do a lookup in the TLB for the
appropriate virtual page entry (i.e. using 12-bit segment identifier
value, plus 16-bit virtual page number, within a segment). This was
sometimes referred to as 40-bit virtual address (combination of the
12-bit segment identifier value plus 28bit virtual address). It was
theoritically possible to have the same 12-bit segment identifier
appear in each of the 16 segment registers ... which would result in
16 different possible (32bit) virtual addresses all mapping to the
same TLB entry.

for rios/power, the segment identifier was doubled from 12bits to
24bits ... and you sometimes saw writeups describing the rios/power
implementation as being 52-bit virtual addressing.

for some drift, the original virtual machine virtualization support in
360 (and later 370) ... created "shadow" segment and page table for a
virtual machine (that transalated from the virtual machine's "3rd"
level virtual memory to the hardware's "1st" level real memory) that
followed TLB (which happened to be called associative array on 360,
somewhat more expensive and fully associative) hardware rules. Every
time the virtual machine changed the virtual segment register, the
shadow table entries would be completely reset (i.e. the low-end 370
hardware model). In the mid-70s an performance enhancement was
implemented for virtual machine support that somewhat emulated the 168
sto-stack ... with multiple "shadow" tables for each virtual machine.

--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/

Andy Glew

unread,

Mar 7, 2006, 7:29:39 PM3/7/06

Anne & Lynn Wheeler <ly...@garlic.com> writes:

> the 165 implementation carried thru the 370/168 and into the 3033
> (which started out being 168 wiring diagram mapped to faster chip
> technology). one of the people that went on to later be an itanium
> architect ...

Who was this? (HM? DA?)

I know most of the itanium architects.

Anne & Lynn Wheeler

unread,

Mar 7, 2006, 8:28:39 PM3/7/06

Andy Glew <first...@employer.domain> writes:
> Who was this? (HM? DA?)
>
> I know most of the itanium architects.

http://www.hpl.hp.com/news/2001/apr-jun/worley.html
http://www.hpl.hp.com/news/2001/apr-jun/itanium.html

and

had to go to the way-back machine for this one
http://web.archive.org/web/20000415125122/http://www.hpl.hp.com/features/bill_worley_interview.html

slightly related recent post
http://www.garlic.com/~lynn/2006.html#39 What happens if CR's are directly changed

for some real topic drift .. a dc3 flight from h*** in the late 60s to visit
him at cornell for a meeting about HASP
http://www.garlic.com/~lynn/2006b.html#27 IBM 610 workstation computer

other drift, after some number of people left for the labs in early
80s, i was getting frequent emails polling who else did i know that
might be joining them and/or whether i might also join them.

Anne & Lynn Wheeler

unread,

Mar 8, 2006, 3:56:17 PM3/8/06

Anne & Lynn Wheeler <ly...@garlic.com> writes:

> virtual memory on 370/165 (early 70s) had a 7-entry sto-stack. tlb
> entries had 3-bit sto-identifier ... which mapped to one of the
> 7-entries in the sto-stack (plus identifier for unused).

ref:
http://www.garlic.com/~lynn/2006e.html#0 About TLB in lower-level caches

for some additional drift ... there was actually quite a bit of
work put into the original 370 virtual memory architecture that
allowed for TLBs hardware implementations to be

1) simple, clear all entries, whenever switching address space

2) "sto-associative" ... this is from the above description, basically
TLB entries were associated with a "STO", segment table origin
... i.e. the real address of the start of segment table (which was
unique definition for a virtual address space).

3) "pto-associative" ... this is page-table origin associative.

370 had a virtual address space segment table with segment entries
that were pointers to the start of individual page tables (containing
page table entries mapping specific virtual page to real page
address). 370 architecture allowed for shared segments (i.e. the same
page table pointed to by different segment tables ... same page-table
appearing in different virtual address spaces).

I don't know of any general TLB hardware implementations that actually
did general pto-associative (i.e. multiple different address spaces
would have shared segment addresses mapping to the same TLB entries).

In effect, the 801 implementation (described in the original post) is
a flavor of pto-associative ... but for an inverted page table
architecture. since you don't have an explicit table, there is no
explicit table start address to uniquely identify each segment
... therefor the definition of arbitrary 12-bit segment identifier (in
romp) ... and later expanded to 24-bit segment identifier (for
rios/power).

Eric P.

unread,

Mar 8, 2006, 8:39:22 PM3/8/06

Anne & Lynn Wheeler wrote:
>
> <...>

> for some drift, the original virtual machine virtualization support in
> 360 (and later 370) ... created "shadow" segment and page table for a
> virtual machine (that transalated from the virtual machine's "3rd"
> level virtual memory to the hardware's "1st" level real memory) that
> followed TLB (which happened to be called associative array on 360,
> somewhat more expensive and fully associative) hardware rules. Every
> time the virtual machine changed the virtual segment register, the
> shadow table entries would be completely reset (i.e. the low-end 370
> hardware model). In the mid-70s an performance enhancement was
> implemented for virtual machine support that somewhat emulated the 168
> sto-stack ... with multiple "shadow" tables for each virtual machine.

What do you mean by "shadow table entries would be completely reset"?
Do you mean the TLB was flushed, or all the mapped physical pages
had to be released (like a working set flush), or something else?

And why was it necessary?

Thanks
Eric

Anne & Lynn Wheeler

unread,

Mar 8, 2006, 9:38:47 PM3/8/06

"Eric P." <eric_p...@sympaticoREMOVE.ca> writes:
> What do you mean by "shadow table entries would be completely reset"?
> Do you mean the TLB was flushed, or all the mapped physical pages
> had to be released (like a working set flush), or something else?

shadow tables were page tables that emulated TLB semantics; aka
... shadow tables provided function equivalent to TLB but for the
operation of the virtual machine. processes that would result in
flushing entry in TLB needed to provide equivalent operation with
regard to the shadow tables .... in this case invalidating page table
entries ... which are the constructs used to emulate hardware TLB for
the virtual machine. Now such page table entries are being
invalidated real page tables (even if they are "shadow" table
constructs for the virtual machine), will, in turn require
that you turn around and flush entries in the real TLB.

the original 370 virtual memory architecture included hardware
instructions, IPTE (invalidate page table entry), ISTE (invalidate
segment table entry), and PTLB (purge table look-aside buffer). IPTE
turns the invalid bit in the PTE entry and would flush any associated
TLB entry. ISTE turns the invalid bit in the STE entry and would flush
any associated TLB entries. PTLB flushes all TLB entries. IPTE and
ISTE instructions were dropped from original shipped products.

various web pages (from search engine) with more detailed discussion
about virtual machine implementation and shadow tables
http://www.cap-lore.com/Software/virtualVirtual.html
http://www.cs.duke.edu/~narten/110/nachos/main/node34.html
http://www.virtualization.info/archive/2005_01_01_archive.html
http://lists.xensource.com/archives/html/xen-devel/2004-08/msg00261.html
http://www.princeton.edu/~melinda/25paper.pdf
http://swpat.ffii.org/pikta/txt/ep/0171/475/
http://www.cs.princeton.edu/courses/archive/spring06/cos592/bib/future_trends-rosenblum.pdf
http://www.stanford.edu/class/cs240/readings/waldspurger.pdf
http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3-088782200fe7/TWSE05008_WinHEC05.ppt
http://www.stanford.edu/~talg/papers/COMPUTER05/virtual-future-computer05.pdf
http://www.usenix.org/events/osdi02/tech/waldspurger/waldspurger_html/node3.html
http://www.vmware.com/pdf/esx2_performance_implications.pdf
http://www.cs.ubc.ca/~norm/cs538a/ibmrd2706L.pdf
http://www.cs.wisc.edu/~stjones/proj/vm_reading/ibmrd2706L.pdf
http://www.patentstorm.us/patents/4814975.html
http://www.uic.edu/depts/accc/inform/ibmglossary.html

Anne & Lynn Wheeler

unread,

Mar 9, 2006, 1:49:35 PM3/9/06

Anne & Lynn Wheeler <ly...@garlic.com> writes:

> various web pages (from search engine) with more detailed discussion
> about virtual machine implementation and shadow tables
> http://www.cap-lore.com/Software/virtualVirtual.html

> http://www.princeton.edu/~melinda/25paper.pdf

a little more shadow table lore. cp/67 originally only had virtual
machine support for vanilla 360 machines ... i.e. no virtual memory
support (no support for the virtual memory hardware shipped in
360/67s). cp/67 ran on 360/67 and made use of the virtual memory
hardware as part of providing (360) virtual machines ... but didn't
provide 360/67 virtual machine. as a result, cp/67 could run "under"
cp/67.

one of the people at the grenoble science center got an assignment
at the cambridge science center
http://www.garlic.com/~lynn/subtopic.html#545tech

and while on the assignment, added 360/67 virtual machine support to
cp/67 (including the original shadow table support).

this was not too long before the joint cambridge/endicott ("cp67h and
"cp67i") project started to provide 370 virtual machine support under
360/67. 370s were originally announced w/o virtual memory support, but
the architecture had been defined. the 370 virtual memory architecture
had some number of differences from that found in 360/67 including
some differences in the definition for the segment and page tables
(used by the hardware).

for the "cp67h" level changes to cp/67 ... there was mode-switch
defining whether 360/67 virtual memory was being emulated or 370
virtual memory was being emulated. the "shadow tables" were the same
format ... since they were being referenced by the real 360/67
hardware ... however, the segment and page tables that were in the
memory of the virtual machine ... which the cp/67 simulation worked
with ... were slightly different (but both formats need to be
translated to native hardware format for shadow tables).

minor trivia, some years later the person that had done the
original virtualizing 360/67 and shadow table support found
himself setting up EARN ... reference:
http://www.garlic.com/~lynn/2001h.html#65

misc. past posts mention "h" and "i" work (it was up and running
standard production ... a year before the first engineering 370
machine with virtual memory support was operation ... and well before
virtual memory for 370 was announced, other trivia, the multi-level
source update process as also developed as part of the "h" and "i"
effort)
http://www.garlic.com/~lynn/2002h.html#50 crossreferenced program code listings
http://www.garlic.com/~lynn/2002j.html#0 HONE was .. Hercules and System/390 - do we need it?
http://www.garlic.com/~lynn/2002j.html#70 hone acronym (cross post)
http://www.garlic.com/~lynn/2004b.html#31 determining memory size
http://www.garlic.com/~lynn/2004d.html#74 DASD Architecture of the future
http://www.garlic.com/~lynn/2004h.html#27 Vintage computers are better than modern crap !
http://www.garlic.com/~lynn/2004p.html#50 IBM 3614 and 3624 ATM's
http://www.garlic.com/~lynn/2005c.html#59 intel's Vanderpool and virtualization in general
http://www.garlic.com/~lynn/2005d.html#58 Virtual Machine Hardware

http://www.garlic.com/~lynn/2005d.html#66 Virtual Machine Hardware

http://www.garlic.com/~lynn/2005g.html#17 DOS/360: Forty years

http://www.garlic.com/~lynn/2005h.html#18 Exceptions at basic block boundaries

http://www.garlic.com/~lynn/2005i.html#39 Behavior in undefined areas?
http://www.garlic.com/~lynn/2005j.html#50 virtual 360/67 support in cp67
http://www.garlic.com/~lynn/2005p.html#27 What ever happened to Tandem and NonStop OS ?
http://www.garlic.com/~lynn/2005p.html#45 HASP/ASP JES/JES2/JES3
http://www.garlic.com/~lynn/2006.html#38 Is VIO mandatory?

Eric P.

unread,

Mar 12, 2006, 1:23:03 PM3/12/06

Anne & Lynn Wheeler wrote:
>

> "Eric P." <eric_p...@sympaticoREMOVE.ca> writes:
> > What do you mean by "shadow table entries would be completely reset"?
> > Do you mean the TLB was flushed, or all the mapped physical pages
> > had to be released (like a working set flush), or something else?
>
> shadow tables were page tables that emulated TLB semantics; aka
> ... shadow tables provided function equivalent to TLB but for the
> operation of the virtual machine. processes that would result in
> flushing entry in TLB needed to provide equivalent operation with
> regard to the shadow tables .... in this case invalidating page table
> entries ... which are the constructs used to emulate hardware TLB for
> the virtual machine. Now such page table entries are being
> invalidated real page tables (even if they are "shadow" table
> constructs for the virtual machine), will, in turn require
> that you turn around and flush entries in the real TLB.
>
> the original 370 virtual memory architecture included hardware
> instructions, IPTE (invalidate page table entry), ISTE (invalidate
> segment table entry), and PTLB (purge table look-aside buffer). IPTE
> turns the invalid bit in the PTE entry and would flush any associated
> TLB entry. ISTE turns the invalid bit in the STE entry and would flush
> any associated TLB entries. PTLB flushes all TLB entries. IPTE and
> ISTE instructions were dropped from original shipped products.

Ok, I think I see why the shadow reset+rebuild is necessary.
http://www.cap-lore.com/Software/virtualVirtual.html
states that VM/370 does not trap guest page table accesses.

VMWare uses 2 pages tables, the guest table managed by the guest OS,
and the per-VM shadow table managed by VMWare and accessed by the
cpu MMU hardware. The VMM traps accesses to the guest tables and makes
the necessary changes to the shadow table. If the guest changes the
page table base register CR3, or reads or writes a PTE, the VMM
traps the new value and diddles the shadow table accordingly.

VM/370 has 3 tables, the guest page table, the "real" table
containing the map from guest to host for a single VM,
and the shadow table used by the hardware MMU and which is
built from the contents of the guest and real tables.
When the guest changes its page table, it issues a PTLB which
the VMM traps and rebuilds the shadow table.

So I suppose my question should have been, why the 3 levels and
full shadow rebuilds instead of 2 levels and guest access traps?
Performance, and/or compatibility with different 360/370
architectures are possibilities.

Eric

Anne & Lynn Wheeler

unread,

Mar 12, 2006, 5:03:24 PM3/12/06

"Eric P." <eric_p...@sympaticoREMOVE.ca> writes:
> So I suppose my question should have been, why the 3 levels and
> full shadow rebuilds instead of 2 levels and guest access traps?
> Performance, and/or compatibility with different 360/370
> architectures are possibilities.

the virtual machine thinks it is running on the real hardware.

table 1:

the virtual machine operating system is running virtual address space
... which requires a mapping between what the operating system
(kernel) running in the virtual machine thinks are the virtual address
space mappings to what it thinks are its real addresses.

table 2:

the hypervisor running on the real hardware has a set of virtual
address tables for each virtual machine ... which gives the mapping
between what the virtual machine operating thinks are "real addresses"
(but in reallity are virtual addresses) and the "real", "real"
addresses of the real hardware.

when the hypervisor (providing virtual machines capability)
running on the real hardware ... dispatches a virtual machine
... the virtual machine may be expecting to run it real
address mode of the virtual machine (in which case, the
hypervisor loads the "table 2" into the appropriate hardware
registers.

table 3 (shadow table):

when the hypervisor (providing virtual machines capability) running on
the real hardware ... dispatches a virtual machine ... the virtual
machine may be expecting to run it virtual address mode of the virtual
machine. The virtual machine has specified "table 1" for
running. However, since "table 1" only provide address mapping from
the virtual address space mapping (maintained by the kernel running in
the virtual machine) to the virtual machine's "real addresses" ...
and since the virtual machine's "real addresses" aren't in fact
the real machine's real addresses ... using "table 1" would produce
invalid results.

as a result, the hypervisor running on the real hardware builds a
shadow of "table 1" ... which goes thru a double mapping. It takes an
entry from "table 1", and translates the virtual machine "real
address", using "table 2" to get the real, real address on the real
machine ... and places that value in the corresponding entry in the
"shadow table" (shadow of table 1).

the rules for maintaining the hardware TLB are followed for
maintaining the shadow of "table 1". This is because, in part because
the TLB architecture is defined as also being a shadow/copy of "table
1" (for the virtual machine) ... and can result in incorrect and/or
unpredictable results if the values in the TLB aren't kept in sync
with the values in entries from "table 1". The virtual machine
hypervisor takes advantage of the shadow/copy characteristics of the
real hardware TLB ... for maintaining the entries in the "shadow" of
"table 1". Note however, the "shadow table" (with translated
shadow/copy entries from "table 1") has to maintained consistent with
both the "table 1" entries as well as "table 2" entries.

the original shadow table implementation had only a single physical
set of storage for a shadow table for each virtual machine. This
corresponded with hardware tlb implementations that only remembered
entries for the current address space ... and didn't have capability
for remembering entries for multiple different address spaces. in this
implementation, every time there is a virtual address space change
... all the existing hardware tlb entries have to be
flushed. Simiarily, in the single shadaw table (per virtual machine)
implementation (where only entries for a single virtual address space
is remembered), everytime the virtual machine changes virtual address
space, all entries in the shadow table are flushed (faithfully
emulated what is done by hardware TLBs).

later shadow table implementations provided for having additional
storage allocated per virtual machine for remembering entries from
multiple different virtual address spaces (for a specific virtual
machine). this corresponds (and emulates) hardware tlbs that remember
entries from multiple differen virtual address spaces (multiple "table
1" per virtual machine). In this scenario ... there are a fixed,
maximum amount of fixed storage for shadow tables ... analogous to the
370/165 TLB description of being STO-associative and having a
seven-sto stack (can remember entries for up to seven different
virtual address spaces simultaneously). However, analogous to the
370/165 TLB description ... when a new address space is loaded that
doesn't correspond to one of the (saved) shadow tables ... and all the
existing physical shadow tables are already associated with a specific
"table 1" ... then one of the existing shadow tables has to have all
its entries flushed and that specific shadow table re-allocated to the
new virtual address space for the virtual machine (new "table 1" for
the virtual machine). Akin this emulates the rules for maintaining
hardware TLB entries.

Now, newer hardware with additional capability could provide virtual
machine hardware support for hardware TLB miss ... where the TLB miss
processing and TLB entry load ... recognizes that it is running in
virtual machine mode ... and that the hardware has to go thru a two
level lookup before obtaining the real, real address; aka this takes
the virtual real address entry from "table 1", does a 2nd lookup using
"table 2" ... in order to obtain, the real, real address (as opposed
to the virtual, real address) for loading into the real hardware TLB
entry.

The shadow table description is where the TLB hardware doesn't have
direct hardware support for doing the two-level table lookup (doing
the lookup in the virtual machine's Table1 ... and then doing a 2nd
lookup in the hypervisor's Table2 in order to obtain the real, real
hardware address) ... and the two-level lookup is done in software on
behalf of the hardware TLB ... and then when running that specific
virtual address space ... the hardware really runs using the "shadow
table" (when the hardware TLB implementation doesn't directly support
virtual machine operation mode and performing a two level lookup
automagically in hardware).

There is also a gotcha for any shadow-table bypass and virtual machine
hardware TLB two-level entry-miss & entry-load implementation and that
is recursive implementation. Part of early virtual machine work was
running a test virtual machine hypervisor under a production virtual
mahcine hypervisor ... where a guest virtual memory operating system
was being tested in the test virtual machine hypervisor. Now you have
three levels of tables (and, in theory, recursive might work to any
level). In the three level scenario:

virtual address space "LEVEL-1" running in virtual machine defined by
virtual address space "TABLE-1" of a operating system running in
a virtual machine

virtual address space "LEVEL-2" which the virtual machine operating
system thinks are real address, but is really defined by "TABLE-2" of
a virtual machine hypervisor "A" that thinks it is running on real
hardware

virtual address space "LEVEL-3" which the virtual machine hypervisor
"A" thinks are real address, but is really defined by "TABLE-3" of a
virtual machine hypervisor "B" that may or may not be running on real
hardware.

Now the real TLB hardware (that doesn't have any builtin two-level
lookup virtual machine capability) is expecting real virtual address
space tables to provide mapping from virtual address space to real,
real addresses (virtual page number to real, real page number).

The problem with "TABLE-1" defining virtual address space "LEVEL-1" is
not providing mapping between virtual addresses and the real, real
hardware page numbers ... "TABLE-1" entries provide mapping between
virtual addess space "LEVEL-1" and virtual address space "LEVEL-2"
... the mapping defined by "TABLE-2" (not the real, real hardware
addresses). Furthermore, the problem with "TABLE-2" is that doesn't
actually provide the mapping between virtual address space "LEVEL-2"
and the real, real hardware page numbers ... but it provides the
mapping between virtual address space "LEVEL-2" and the virtual
address space "LEVEL-3" defined by "TABLE-3". If this is truely
recursive ... it isn't until you get to "LEVEL-n" that you really have
a "TABLE-n" that is actually providing any mapping between a virtual
address space and the real, real page numbers.

So to generalize this, the virtual machine hypervisor at LEVEL-n,
builds shadow tables (that emulate the semantics of real hardware
TLBs) for virtual machine virtual address tables at LEVEL-(n-1). The
virtual machine hypervisor shadow tables emulate the virtual address
space tables at LEVEL-(n-1) but using what it believes to be real
addresses at LEVEL-n. If the virtual machine hypervisor is really
running in a virtual machine recursive environment ... there could be
several generation of shadow tables ... each providing the converting
the virtual, real page numbers at level (n-1) into page numbers of
level (n). It isn't until you get to the very bottom, real hardware
level do you finally encounter a shadow table that might get loaded
into the real, real hardware registers ... because it actually is the
only table that contains the real, real page numbers.

lots of past references to shadow tables

http://www.garlic.com/~lynn/2006e.html#0 About TLB in lower-level caches

http://www.garlic.com/~lynn/2006e.html#6 About TLB in lower-level caches
http://www.garlic.com/~lynn/2006e.html#7 About TLB in lower-level caches

Jan Vorbrüggen

unread,

Mar 13, 2006, 2:56:22 AM3/13/06

> the hypervisor running on the real hardware has a set of virtual
> address tables for each virtual machine ... which gives the mapping
> between what the virtual machine operating thinks are "real addresses"
> (but in reallity are virtual addresses) and the "real", "real"
> addresses of the real hardware.
>
> when the hypervisor (providing virtual machines capability)
> running on the real hardware ... dispatches a virtual machine

> .... the virtual machine may be expecting to run it real

> address mode of the virtual machine (in which case, the
> hypervisor loads the "table 2" into the appropriate hardware
> registers.

Doesn't the microcode and the VM implementation of some models have an
optimization for the one possible V=R virtual machine, often running MVS?
That shortcuts most of the necessary translation levels.

Jan

Anne & Lynn Wheeler

unread,

Mar 13, 2006, 12:06:31 PM3/13/06

Jan Vorbrüggen <jvorbrue...@mediasec.de> writes:
> Doesn't the microcode and the VM implementation of some models have
> an optimization for the one possible V=R virtual machine, often
> running MVS? That shortcuts most of the necessary translation
> levels.

Amdahl introduced hardware hypervisor ... with V=R (and since there
is virtual=real, there is no two level lookup necessary)
http://www.garlic.com/~lynn/2006b.html#38 blast from the past ... macrocode

3090 eventually responded with PR/SM ... PR/SM eventually evolved into
the current LPARS (logical partitions; basically subset of virtual
machine support built into the fabric of the hardware, isn't V=R, but
supports small number of LPAR virtual machines) ... where many
mainframes are run with logical partition support as a matter of
standard production operation.

this abstract includes mention that z990 requires LPAR mode:

IBM Techdocs Technote: zSeries Performance: Determining the Logical CP
Requirements for a Partition
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/032f6e163324983085256b79007f5aec/7eda0421b6d7af1586256db4001efbeb?OpenDocument

a few other references from around the web found with search engine

IBM: PR/SM for zSeries awarded certificate by German Fed Office for
Tec Security
http://www-03.ibm.com/servers/eserver/zseries/news/pressreleases/2003/z900securityaward_03-14-03.html
IBM Techdocs Flash: LPAR Management Time Performance Update
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/fe582a1e48331b5585256de50062ae1c/bcbaeece31d0d9ff85256d8c000d4e4f?OpenDocument
IBM Redbooks | Modeling Host Environments using SNAP/SHOT
http://www.redbooks.ibm.com/abstracts/SG245314.html?Open

http://www.garlic.com/~lynn/2005d.html#70 Virtual Machine Hardware

Eric P.

unread,

Mar 13, 2006, 4:26:15 PM3/13/06

Anne & Lynn Wheeler wrote:
>

> <snip>

> the rules for maintaining the hardware TLB are followed for
> maintaining the shadow of "table 1". This is because, in part because
> the TLB architecture is defined as also being a shadow/copy of "table
> 1" (for the virtual machine) ... and can result in incorrect and/or
> unpredictable results if the values in the TLB aren't kept in sync
> with the values in entries from "table 1". The virtual machine
> hypervisor takes advantage of the shadow/copy characteristics of the
> real hardware TLB ... for maintaining the entries in the "shadow" of
> "table 1". Note however, the "shadow table" (with translated
> shadow/copy entries from "table 1") has to maintained consistent with
> both the "table 1" entries as well as "table 2" entries.

Ok, something like...

The shadow table starts out as empty and you treat the ensuing
page faults as though they were TLB miss faults. You use faults
to pull in PTE values from lookups into the guest and real tables.

- If the guest table indicates the pte is invalid,
VMM signals a "virtual" page fault to the guest VM.
- If the guest PTE is valid, VMM takes the "frame number"
in it and indexes into the real table to get a machine frame number.
- If the machine frame number is invalid, signal an actual page fault
to VMM which inswaps the guest VM page.
- If the machine frame number is valid,
deposit it into the shadow table and restart.

Because it treats the page faults as TLB miss faults, the
shadow PTE values are pulled in on demand. There is therefore no
need to put access traps on the guest table to detect changes
as VMWare does.

Correct?

> the original shadow table implementation had only a single physical
> set of storage for a shadow table for each virtual machine. This
> corresponded with hardware tlb implementations that only remembered
> entries for the current address space ... and didn't have capability
> for remembering entries for multiple different address spaces. in this
> implementation, every time there is a virtual address space change
> ... all the existing hardware tlb entries have to be
> flushed. Simiarily, in the single shadaw table (per virtual machine)
> implementation (where only entries for a single virtual address space
> is remembered), everytime the virtual machine changes virtual address
> space, all entries in the shadow table are flushed (faithfully
> emulated what is done by hardware TLBs).

Does VM limit the number of shadow PTEs that can be valid
at once similar to the real TLB, to say 128?

It could keep a circular FIFO buffer of the last 128 shadow entries
made valid, so that when a VM issues a PTLB there is just a
small number of shadow PTEs to invalidate.

This would make it behave just like a 128 entry software managed TLB.
To perform a PTLB it need only zap the 128 PTEs from the shadow table
listed in the circular buffer.

> <snip>

Eric

Anne & Lynn Wheeler

unread,

Mar 13, 2006, 5:14:57 PM3/13/06

"Eric P." <eric_p...@sympaticoREMOVE.ca> writes:
> Does VM limit the number of shadow PTEs that can be valid
> at once similar to the real TLB, to say 128?
>
> It could keep a circular FIFO buffer of the last 128 shadow entries
> made valid, so that when a VM issues a PTLB there is just a
> small number of shadow PTEs to invalidate.
>
> This would make it behave just like a 128 entry software managed TLB.
> To perform a PTLB it need only zap the 128 PTEs from the shadow table
> listed in the circular buffer.

the shadow table is called a shadow table because it is an exact set
of hardware tables ... that correspond to the virtual tables ... but
instead of having translation from virtual to virtual real ... takes
any miss for the virtual address, extracts the virtual/real address
from the virtual table ... and uses the virtual machine address tables
to translate from the virtual/real address to the real/real address.

it follows the semantics of TLB but uses real tables for the
implementation. in theory, if there is a valid entry in the virtual
table (located in the virtual tables in the virtual machine) ... and
the corresponding virtual/real page for the virtual machine is in real
storage ... then the real/real page number populates the real page
table.

the architecture was careful about no specifying how TLBs were
implemented ... so that hardware engineers had great deal of latitude
in how they could do TLB implementation. As a result, it also provided
sufficient latitude to do shadow tables where were a full table set.

the original implementation didn't keep track of how many entries in
the shadow table had been made valid ... so that when a purge type
operation was required by the architecture of the hardware ... the
software implementation just did a complete clear of all possible
entries.

previous posts and more detailed references to the implementation:
http://www.garlic.com/~lynn/2006.html#6 UDP and IP Addresses

http://www.garlic.com/~lynn/2006e.html#0 About TLB in lower-level caches

http://www.garlic.com/~lynn/2006e.html#1 About TLB in lower-level caches
http://www.garlic.com/~lynn/2006e.html#5 About TLB in lower-level caches

http://www.garlic.com/~lynn/2006e.html#6 About TLB in lower-level caches
http://www.garlic.com/~lynn/2006e.html#7 About TLB in lower-level caches

http://www.garlic.com/~lynn/2006e.html#12 About TLB in lower-level caches
http://www.garlic.com/~lynn/2006e.html#15 About TLB in lower-level caches

Anne & Lynn Wheeler

unread,

Mar 13, 2006, 5:40:23 PM3/13/06

however, for some drift ... i had done an implementation in
the 70s that did something similar for a different reason.

there was an (internal corporate) activity trace tool called redcap
that used some virtual machine like technology to capture all I and D
(including differentiating loads/stores) and spit out a instruction
trace with storage references.

I got to modify that at the science center
http://www.garlic.com/~lynn/subtopic.html#545tech

in support of an internal tool ... that was eventually released in the
mid-70s as a product called vs/repack.

modification was instead of spitting out an actual instruction trace
... create a reference/use storage array representation for the
application ... you could specify 16-byte, 32-byte, 64-byte, etc,
increments ... and would spit out the array at specified intervals
... say every 1000 or 5000 or 10000 instructions.

the three arrays were for instruction fetch, data fetch, and data
store ... with one bit indicating something had occured in that
storage line sometime during the specified instruction interval.

the application that eventually become vs/repack would take that
information ... along with map of the application and attempt to do
semi-automated program reorganization for optimal virtual memory
operation.

one of the things the science center used this analysis for was the
port of apl\360 from real-storage environment to cms\apl and a virtual
memory operation (including requiring redo of the whole apl storage
management and garbage collection).

it was also used by a number of internal product development
organizations ... not just for transition from real-storage paradigm
to virtual-address-space operation ... but also for generalized
execution characteristic operation.

so it turned out that the full instruction emulation was somewhat
time-consuming and provided much greater detail than was needed for a
lot of application structure re-organization.

so i did a special modification of the kernel virtual address space
manager. a victim virtual address space could be selected and
specified as being monitored and not to ever have more than N PTEs
valid at any moment (akin to your description). Everytime the N+1 PTE
fault occured ... the list of current valid virtual page numbers would
be spit out ... and the existing N PTEs had their invalid flag turned
on (leaving the real page number in the PTE for translation purposes,
but indicating to the hardware that a page fault was to occur for that
virtual page). Then the TLB was reset ... and the process would
repeat. You sort of got a running indication of virtual pages that
were members of active working set. The level of detail was at a much
grosser level than full instruction operation provided by REDCAP
... but at much high performance ... and frequently good enuf for the
task at hand.

a few years later ... i did a similar kernel hack ... but instead of
tracking virtual page use ... tracked all disk file record use. this
was used against a number of different production environments as part
of designing disk and controller caches ... as well as some filesystem
design work. it was one of the first places were we started to see
detailed information about bursty use of file collections (i.e. the
same collection of files/records used once a day, or week, or month or
some other periodic basis).

http://www.garlic.com/~lynn/2004q.html#76 Athlon cache question

Anne & Lynn Wheeler

unread,

Mar 14, 2006, 12:58:48 PM3/14/06

Anne & Lynn Wheeler <ly...@garlic.com> writes:

> however, for some drift ... i had done an implementation in

> the early 70s that did something similar for a different reason.

ref:
http://www.garlic.com/~lynn/2006e.html#20 About TLB in lower-level caches

part of this was that the science center (4th flr, 545 tech
sq)
http://www.garlic.com/~lynn/subtopic.html#545tech

had done a lot of pioneering work in monitoring, profiling, modeling
and tuning ... some minor drift
http://www.garlic.com/~lynn/subtopic.html#bench

that eventually evolved into such things as capacity planning.

that is besides having done the original virtual machine
implementation ... Melinda's historical paper
http://www.princeton.edu/~melinda/25paper.pdf

as well as having done the implementation used for the internal network
http://www.garlic.com/~lynn/subnetwork.html#internalnet

and, later, also for bitnet/earn
http://www.garlic.com/~lynn/subnetwork.html#bitnet

and lot of online/interactive stuff ... including invention of gml(you
might think stands for generalized markup language ... but actually is
first initials of the inventor's last names) ... which later morphed
into html, xml, etc
http://www.garlic.com/~lynn/subtopic.html#sgml

the compare&swap instruction was also invented at the science center
("CAS" comes from the inventor's initials):
http://www.garlic.com/~lynn/subtopic.html#smp

0 new messages