Cache 4

0 views

Skip to first unread message

Basa Benejan

unread,

Aug 3, 2024, 4:10:47 PM8/3/24

to myforbuistic

To be cost-effective, caches must be relatively small. Nevertheless, caches are effective in many areas of computing because typical computer applications access data with a high degree of locality of reference. Such access patterns exhibit temporal locality, where data is requested that has been recently requested, and spatial locality, where data is requested that is stored near data that has already been requested.

In memory design, there is an inherent trade-off between capacity and speed because larger capacity implies larger size and thus greater physical distances for signals to travel causing propagation delays. There is also a tradeoff between high-performance technologies such as SRAM and cheaper, easily mass-produced commodities such as DRAM, flash, or hard disks.

The use of a cache also allows for higher throughput from the underlying resource, by assembling multiple fine-grain transfers into larger, more efficient requests. In the case of DRAM circuits, the additional throughput may be gained by using a wider data bus.

Hardware implements cache as a block of memory for temporary storage of data likely to be used again. Central processing units (CPUs), solid-state drives (SSDs) and hard disk drives (HDDs) frequently include hardware-based cache, while web browsers and web servers commonly rely on software caching.

A cache is made up of a pool of entries. Each entry has associated data, which is a copy of the same data in some backing store. Each entry also has a tag, which specifies the identity of the data in the backing store of which the entry is a copy.

When the cache client (a CPU, web browser, operating system) needs to access data presumed to exist in the backing store, it first checks the cache. If an entry can be found with a tag matching that of the desired data, the data in the entry is used instead. This situation is known as a cache hit. For example, a web browser program might check its local cache on disk to see if it has a local copy of the contents of a web page at a particular URL. In this example, the URL is the tag, and the content of the web page is the data. The percentage of accesses that result in cache hits is known as the hit rate or hit ratio of the cache.

The alternative situation, when the cache is checked and found not to contain any entry with the desired tag, is known as a cache miss. This requires a more expensive access of data from the backing store. Once the requested data is retrieved, it is typically copied into the cache, ready for the next access.

During a cache miss, some other previously existing cache entry is typically removed in order to make room for the newly retrieved data. The heuristic used to select the entry to replace is known as the replacement policy. One popular replacement policy, least recently used (LRU), replaces the oldest entry, the entry that was accessed less recently than any other entry. More sophisticated caching algorithms also take into account the frequency of use of entries.

When a system writes data to cache, it must at some point write that data to the backing store as well. The timing of this write is controlled by what is known as the write policy. There are two basic writing approaches:[3]

A write-back cache is more complex to implement since it needs to track which of its locations have been written over and mark them as dirty for later writing to the backing store. The data in these locations are written back to the backing store only when they are evicted from the cache, a process referred to as a lazy write. For this reason, a read miss in a write-back cache will often require two memory backing store accesses to service: one for the write back, and one to retrieve the needed data. Other policies may also trigger data write-back. The client may make many changes to data in the cache, and then explicitly notify the cache to write back the data.

Entities other than the cache may change the data in the backing store, in which case the copy in the cache may become out-of-date or stale. Alternatively, when the client updates the data in the cache, copies of those data in other caches will become stale. Communication protocols between the cache managers that keep the data consistent are associated with cache coherence.

On a cache read miss, caches with a demand paging policy read the minimum amount from the backing store. A typical demand-paging virtual memory implementation reads one page of virtual memory (often 4 KB) from disk into the disk cache in RAM. A typical CPU reads a single L2 cache line of 128 bytes from DRAM into the L2 cache, and a single L1 cache line of 64 bytes from the L2 cache into the L1 cache.

A few operating systems go further with a loader that always pre-loads the entire executable into RAM. A few caches go even further, not only pre-loading an entire file, but also starting to load other related files that may soon be requested, such as the page cache associated with a prefetcher or the web cache associated with link prefetching.

Small memories on or close to the CPU can operate faster than the much larger main memory.[6] Most CPUs since the 1980s have used one or more caches, sometimes in cascaded levels; modern high-end embedded, desktop and server microprocessors may have as many as six types of cache (between levels and functions).[7] Some examples of caches with a specific function are the D-cache, I-cache and the translation lookaside buffer for the memory management unit (MMU).

Earlier graphics processing units (GPUs) often had limited read-only texture caches and used Swizzling (computer graphics) to improve 2D locality of reference. Cache misses would drastically affect performance, e.g. if mipmapping was not used. Caching was important to leverage 32-bit (and wider) transfers for texture data that was often as little as 4 bits per pixel.

As GPUs advanced, supporting general-purpose computing on graphics processing units and compute kernels, they have developed progressively larger and increasingly general caches, including instruction caches for shaders, exhibiting functionality commonly found in CPU caches. These caches have grown to handle synchronization primitives between threads and atomic operations, and interface with a CPU-style MMU.

Digital signal processors have similarly generalized over the years. Earlier designs used scratchpad memory fed by direct memory access, but modern DSPs such as Qualcomm Hexagon often include a very similar set of caches to a CPU (e.g. Modified Harvard architecture with shared L2, split L1 I-cache and D-cache).[8]

A memory management unit (MMU) that fetches page table entries from main memory has a specialized cache, used for recording the results of virtual address to physical address translations. This specialized cache is called a translation lookaside buffer (TLB).[9]

Information-centric networking (ICN) is an approach to evolve the Internet infrastructure away from a host-centric paradigm, based on perpetual connectivity and the end-to-end principle, to a network architecture in which the focal point is identified information (or content or data). Due to the inherent caching capability of the nodes in an ICN, it can be viewed as a loosely connected network of caches, which has unique requirements of caching policies. However, ubiquitous content caching introduces the challenge to content protection against unauthorized access, which requires extra care and solutions.[10]

Unlike proxy servers, in ICN the cache is a network-level solution. Therefore, it has rapidly changing cache states and higher request arrival rates; moreover, smaller cache sizes further impose a different kind of requirements on the content eviction policies. In particular, eviction policies for ICN should be fast and lightweight. Various cache replication and eviction schemes for different ICN architectures and applications have been proposed.

The Time aware Least Recently Used (TLRU)[11] is a variant of LRU designed for the situation where the stored contents in cache have a valid life time. The algorithm is suitable in network cache applications, such as ICN, content delivery networks (CDNs) and distributed networks in general. TLRU introduces a new term: TTU (Time to Use). TTU is a time stamp of a content/page which stipulates the usability time for the content based on the locality of the content and the content publisher announcement. Owing to this locality based time stamp, TTU provides more control to the local administrator to regulate in network storage.

In the TLRU algorithm, when a piece of content arrives, a cache node calculates the local TTU value based on the TTU value assigned by the content publisher. The local TTU value is calculated by using a locally defined function. Once the local TTU value is calculated the replacement of content is performed on a subset of the total content stored in cache node. The TLRU ensures that less popular and small life content should be replaced with the incoming content.

The Least Frequent Recently Used (LFRU)[12] cache replacement scheme combines the benefits of LFU and LRU schemes. LFRU is suitable for 'in network' cache applications, such as ICN, CDNs and distributed networks in general. In LFRU, the cache is divided into two partitions called privileged and unprivileged partitions. The privileged partition can be defined as a protected partition. If content is highly popular, it is pushed into the privileged partition. Replacement of the privileged partition is done as follows: LFRU evicts content from the unprivileged partition, pushes content from privileged partition to unprivileged partition, and finally inserts new content into the privileged partition. In the above procedure the LRU is used for the privileged partition and an approximated LFU (ALFU) scheme is used for the unprivileged partition, hence the abbreviation LFRU. The basic idea is to filter out the locally popular contents with ALFU scheme and push the popular contents to one of the privileged partition.