Lecture Notes for Operating Systems

Virtual Memory and Paging, 11 November 2004

Big Programs, Small Storage

We still need to run programs bigger than primary store.
Overlays use programmer-defined fragments.
Virtual memory and paging use system-defined fragments.
- Both the hardware and os are invoved.

Big Programs

Big programs excede physical memory.
- They use addresses from N up.
Cut big program into pieces.

Big Addresses

The program still has addresses from N up.
Solve this problem with reverse relocation.

Rather than adding a base, subtract it.
- Or just compile the second half with base 0.

Reverse Relocation

Consider a program that is twice as long as physical memory.
- Cut the program in half.
- Relocate the top-half addresses by -N.
- Now both halves of the program have addresses in the range 0 to N - 1.
This works only when each half is independent of the other.
- Addresses are recycled, and so are ambiguous.

Overlay Management

Transitions between halves is done by system call.
The OS
- writes the old half to disk to save modified data, then
- reads in the new half, resets the process state, and returns.
This is expensive.

Overlays

Overlays generalize the two-part program.
Each overlay is a program fragment relocated down into some part of physical storage, known as an overlay region.
- Several overlays may map into the same part of physical storage.
- A program may comprise several overlay regions.
The programmer defines the overlays and overlay regions,
the compiler creates them,
and to OS implements them.

Virtual Memory

virtual addresses - completely decouple physical and process addresses
virtual memory fragments - pages
virtual addresses - page #, page offset

Paging

Pages are uniform overlays.
Memory management unit - virtual to real address mapping.
Translation look-aside buffer - speeding up the mmu.
Inverted page tables.

Virtual Memory

Virtual memory is like physical memory, except that virtual memory doesn't exist(!).
- A (conceptual) fixed-length array of units.
VM units are the same size as PM units.
N, the virtual-memory size, is usually vastly larger than physical-memory size.
- It can also be smaller, or the same size.
- N is usually 2³² (4.2 gig) or 2⁶⁴.

Virtual Addresses

A virtual address is an index in the array model.
Pages give virtual addresses a structure not found in physical addresses.
A virtual address identifies both
- a unit in virtual memory, and
- the page containing the unit.

VA Structure

Assume a page contains 2_i units.
- The page offset is the i-bit part of the virtual address.
- The page index The (s - i)-bit part of the virtual address.

Memory Management Unit

Virtual addresses have to be mapped to physical addresses.
Page tables map from virtual address to page frames.
- issues
  - page table size - the trade-off
  - translation speed - as fast as possible
  - context switch overhead - as low as possible
- single level page tables
  - registers - small and expensive but fast
  - main store - big and cheap but slow
- multi-level page tables - 2¹⁹ 8k pages in 32-bit address space
- page table entries - page frame number, present-absent bit, protection bits, modified-referenced bits, cache bit, pinned bit

Translation Look-Aside Buffer

Speeding up the mmu.
caching page table to page frame mappings
hardware or software
prefetching - process stack + code

TLB Example

For example, consider 50 ms TLB lookup, 750 ms memory access.
- TLB hit: 50 tlb + 750 memory access
- TLB miss: 50 tlb + 750 page table access + 750 memory access
A 8-16 entry TLB has a 80-90% hit ratio.

Inverted Page Tables

64-bit addresses - 16k page frames, 2⁴⁸ = 281*10¹² pages
page frame -> (proc, page no) mappings
- 2²⁸ mbyte main store -> 2¹² 2¹⁶kbyte page frames
tlb to speed-up mapping
hashing addresses on tlb failure

Page Replacement

What to do when good addresses go bad?
page faults
random page selection - yank any in-store page; not very effieient
optimal page replacement - furthest use in the future
not recently used
- modified-reference bits -
  - periodically clear reference bits to find old, unreferenced pages
  - toss not referenced before referenced pages, and not modified before modified pages
- hardware implementation - still need to clear reference bits
- software - interrupts and protection
fifo - oldest out next; no reference or modify checking
second chance - oldest unreferenced out
clock page replacement - second chance or fifo as a ring

LRU Replacement

The past as prolog.
Page queue ordered by access time.
Counters
Matrix
LRU simulations.
- Not frequently used adds a read bit to a counter; doesn't forget.
- Aging shifts the read bit in from the right.
LRU problems.
- There is no intra-interval granularity.
- Resolution is finite and small.

Design Issues

working set model
local vs. global page allocations
page size
the memory-management interface

working set model

demand paging
locality of reference
working set
thrashing
prefetch
using counters to compute the working set
adding the working set to paging algorithms - wsclock

local vs. global page allocations

Fractional vs. weighted total allocations.
Managing by page-fault frequency.
Swapping is still useful to handle global thrashing.

page size

Set by hardware or software - alpha 8, 16, 32, 64kb pages.
Internal fragment, small segments argues for small pages.
Small page tables, paging overhead argues for large pages.
Page table overhead on context switches.

The Memory-Management Interface

Sharing pages.
Pinning pages.
Mapping files.
Distributed shared memory.

Overlay addressing

Big programs use address outside the range of physical storage.
Overlay relocation into regions solved the problem for overlays.
A similar technique can solve the problem for virtual memory.
- Essentially relocate pages into page frames.

Relative Addressing and Relocation

A program with relative addresses has offsets from a base address.
Relocation involved adding a program base to all the offsets to produce absolute addresses.
Link-load time relocation is inflexible.
- Once a relative program is relocated to absolute addresses, it's difficult or impossible to re-relocate again.

Relocation Registers

Hardware can help retain relocation flexibility throughout a process's life.
Keep the base address in a relocation register and add each offset to the base address.
The process can be relocated by moving it to new base address B' and storing B' in the relocation register.

Relocating Virtual Addresses

Treat each page as a mini-program.
- Addresses in the page are offsets from the start of the page.
Assume each page frame has a relocation register.
Virtual address can be relocated to absolute addresses.

The Virtual Relocation Calcuation

The virtual-relocation calculation needs a base address and an offset.
The structure of the virtual-address space gives us both these values.
- The base address requires a bit more work than does the offset.

Virtual Memory Structure

The regular page structure imposed on virtual memory induces a structure on virtual addresses.
If a 2ⁿ-unit virtual-address space is divided into 2^p pages, p < n, how big is each page?
- 2ⁿ/2^p = 2^{n - p} units.
Alternatively, if a 2ⁿ virtual-address space has 2^s-byte pages, s <= n, how many pages are there?
- 2ⁿ/2^s = 2^{n - s} pages

Virtual Address Structure

An n-bit virtual address can be split into two parts:
- The high-order p-bit page index, and
- The low-order (n - p)-bit page offset, (n - p = s).
For example, consider a 4-bit virtual-address split into four pages. n = 4, p = 2, s = 4 - 2 = 2. 0000 0100 1000 1100 0001 0101 1001 1101 0010 0110 1010 1110 0011 0111 1011 1111

The Page-Page Frame Mapping

Given the virtual address i*2_s + f, which page frame holds a page index i?
Conceptually, we need an array of size 2_p.
- Page i is in page frame A[i].
- If page frame A[i]) is empty, throw a page fault.
This array is called the page table.
- The page table also contains other information for each page:
  - Read-write-execute permissions.
  - Access-modify status bits.

Page-Table Hardware

Virtual-address relocation has to be done on every address issue.
The memory-management unit (MMU) helps make sure va relocation is fast.
- It holds the page table and does the math.
MMUs can be part of the CPU or off-CPU (but not too far away).
- Given their size, they're usually off-CPU.

Page Table Size

Page tables have two unattractive properties:
- They're big.
  - A 32-bit viratual address with 8k (= 2¹³)-byte pages has 2^{32 - 13} = 2¹⁹ = 512k entries.
  - A page-table entry is at least four bytes for a total of 2 mbytes.
- They're sparse.
  - There are few gigabyte programs, even with data.
Even if MMUs were big enough to hold a page table, it would be wasteful.

Segmented Page Tables

A page table can be broken up into segments.
The MMU contains only the segments for code and data being accessed.
- Page faulting becomes more complex, as does indexing.

Hierarchal Page Tables

Rather than a flat array of page-table segments, create a tree of segments.
- This packs page-table segments densely in virtual memory.
The tree depth is the page-table level, usually two or three.

The Translation Look-Aside Buffer

The MMU can't be too fast, and caching is a good way to speed things up.
The cache for the MMU is called the Translation Look-Aside Buffer (TLB).

Inverted Page Tables

Generally, systems will have less than full compliment of physical storage.
- 32-bit systems rarely have 4 gig of physical storage.
The number of page frames can be smaller than the number of pages in a program.
- This is not so true as it used to be with more cost-effective larger primary stores.
Rather than index the page table by page index, which can be large, index it by page-frame number, which will usually be smaller.

Page Replacement

What happens when a page needs to be brought in from disk and there are no open page frames?
- A resident page needs to be ejected to free up a page frame.
The page-replacement algorithm determines which pages to eject.

Random Page Replacement

Just pick a page frame at random and eject the resident page.
Simple but not too efficient.
- It may pick a hot page that will have to be paged in again soon.

Optimal Page Replacement

The best page to eject is the one that will remain untouched for the longest time.
- It won't have to be paged in until sometime far into the future.
Always selecting the longest untouched page results in optimal (minamal) paging behavior.
- Unfortunately, because it requires knowledge of the future, it's impossible to implement.
But maybe we could make a good guess about future behavior.

Least-Recently Used Replacement

Estimate future behavior from past behavior.
- A page that hasn't been accessed probably won't be accessed.
Eject the page that has been unaccessed the longest.
This is known as Least Recently Used or (LRU) replacement.
LRU replacement works well, but is expensive to implement.
- It requires bits per page frame to keep time and lots of comparisons.

Estimating LRU Replacement

There are lots of LRU estimators using fewer clock bits (sometimes only one) and comparisons.
- These are usually still expensive because the clock bits have to be handled by OS software and not hardware.
The queue is a simple and common LRU estimator.
- A page access brings the page to the head of the queue.
- Unaccessed pages sink to the end of the queue.
- A sampling rate (every nth access) reduces the need to requeue on every access.

Second-Chance Queueing

Sampling may cause accessed pages to sink to the end of the queue.
Assign each page a second-chance bit, initially 0.
If the end-page second-chance bit is 1, eject it.
If the end--page second-chance bit is 0, requeue it and set the second-chance bit to 1.
The second-chance bit is cleared whenever the pge is accessed.

Clock replacement

Rather than maintain a queue of pages, form them into a ring and use a pointer to mark the head of the queue.
A trailing second hand can serve as the second-chance bit scanner, speeding up the rate at which pages are ejected.
- The angle between the head and second scanner is a parameter of the algorithm.

Paging

If a paged selected for ejection has been modified (is a dirty page), it needs to be written out to disk to save the modifications.
- Unmodified pages don't need to be written out, they can just be overwritten.
- Bias page-selection algorithms towards unmodified pages.

Paging Areas

Dirty pages cannot be written back to the files from which they came.
- There may be more than one copy of the program running, and so more than one copy of the modified page.
Dirty pages are written to a special file system called the paging (or sometimes swapping) area.
The paging area is a special-purpose file system designed to provide fast page IO.
- Regular files can also be paging areas too, with decreased IO efficiency.

Page-Frame Allocation

How many page frames should a process get?
Local page-frame allocation partitions the page frames among processes.
- Each process gets a subset of non-shared page frames.
- When each process uses fills its allotment of page frames, paging begins.
Global page-frame allocation makes all page frames available to all processes.
- Each process gloms on to as many page frames as it can.
- Paging begins when all page frames are gone.
Local allocation is fair but potentially wasteful; poor allocation decisions may lead to excessive paging or unused page frames.
Global allocation acheives better utilization (fewer unused page frames), but may cause a form of extreme paging known as thrashing.

Static vs. Dynamic Allocation

Static page-frame allocation allocates a fixed and unchanging number of pages at process start.
Dynamic page-frame allocation allows the number of allocated page frames to change with process behavior.

This page last modified on 19 November 2004.

Lecture Notes for Operating Systems

Virtual Memory and Paging, 11 November 2004

Big Programs, Small Storage

Big Programs

Big Addresses

Reverse Relocation

Overlay Management

Overlays

Virtual Memory

Paging

Pages

Virtual Memory

Virtual Addresses

VA Structure

Memory Management Unit

Translation Look-Aside Buffer

TLB Example

Inverted Page Tables

Page Replacement

LRU Replacement

Design Issues

local vs. global page allocations

page size

The Memory-Management Interface

Overlay addressing

Relative Addressing and Relocation

Relocation Registers

Relocating Virtual Addresses

The Virtual Relocation Calcuation

Virtual Memory Structure

Virtual Address Structure

The Page-Page Frame Mapping

Page-Table Hardware

Page Table Size

Segmented Page Tables

Hierarchal Page Tables

The Translation Look-Aside Buffer

Inverted Page Tables

Page Replacement

Random Page Replacement

Optimal Page Replacement

Least-Recently Used Replacement

Estimating LRU Replacement

Second-Chance Queueing

Clock replacement

Paging

Paging Areas

Page-Frame Allocation

Static vs. Dynamic Allocation