Big Programs, Small Storage
- We still need to run programs bigger than primary store.
- Overlays use programmer-defined fragments.
- Virtual memory and paging use system-defined fragments.
- Both the hardware and os are invoved.
- Big programs excede physical memory.
- They use addresses from N up.
- Cut big program into pieces.
- The program still has addresses from N up.
- Solve this problem with reverse relocation.
- Rather than adding a base, subtract it.
- Or just compile the second half with base 0.
- Consider a program that is twice as long as physical memory.
- Cut the program in half.
- Relocate the top-half addresses by -N.
- Now both halves of the program have addresses in the range 0 to
N - 1.
- This works only when each half is independent of the other.
- Addresses are recycled, and so are ambiguous.
- Transitions between halves is done by system call.
- The OS
- writes the old half to disk to save modified data, then
- reads in the new half, resets the process state, and returns.
- This is expensive.
- Overlays generalize the two-part program.
- Each overlay is a program fragment relocated down into some part of
physical storage, known as an overlay region.
- Several overlays may map into the same part of physical storage.
- A program may comprise several overlay regions.
- The programmer defines the overlays and overlay regions,
the compiler creates them,
and to OS implements them.
- virtual addresses - completely decouple physical and process addresses
- virtual memory fragments - pages
- virtual addresses - page #, page offset
- Pages are uniform overlays.
- Memory management unit - virtual to real address mapping.
- Translation look-aside buffer - speeding up the mmu.
- Inverted page tables.
- The process address space is divided into pages.
- Physical memory is divided into page frames.
- Page frames and pages are (usually) the same size.
- Virtual memory is like physical memory, except that virtual memory
- A (conceptual) fixed-length array of units.
- VM units are the same size as PM units.
- N, the virtual-memory size, is usually vastly larger than
- It can also be smaller, or the same size.
- N is usually 232 (4.2 gig) or 264.
- A virtual address is an index in the array model.
- Pages give virtual addresses a structure not found in physical
- A virtual address identifies both
- a unit in virtual memory, and
- the page containing the unit.
- Assume a page contains 2i units.
- The page offset is the i-bit part of the virtual address.
- The page index The (s - i)-bit part of the virtual
Memory Management Unit
- Virtual addresses have to be mapped to physical addresses.
- Page tables map from virtual address to page frames.
- page table size - the trade-off
- translation speed - as fast as possible
- context switch overhead - as low as possible
- single level page tables
- registers - small and expensive but fast
- main store - big and cheap but slow
- multi-level page tables - 219 8k pages in 32-bit
- page table entries - page frame number, present-absent bit,
protection bits, modified-referenced bits, cache bit, pinned bit
Translation Look-Aside Buffer
- Speeding up the mmu.
- caching page table to page frame mappings
- hardware or software
- prefetching - process stack + code
- For example, consider 50 ms TLB lookup, 750 ms memory access.
- TLB hit: 50 tlb + 750 memory access
- TLB miss: 50 tlb + 750 page table access + 750 memory access
- A 8-16 entry TLB has a 80-90% hit ratio.
Inverted Page Tables
- 64-bit addresses - 16k page frames, 248 =
- page frame -> (proc, page no) mappings
- 228 mbyte main store -> 212
216kbyte page frames
- tlb to speed-up mapping
- hashing addresses on tlb failure
- What to do when good addresses go bad?
- page faults
- random page selection - yank any in-store page; not very effieient
- optimal page replacement - furthest use in the future
- not recently used
- modified-reference bits -
- periodically clear reference bits to find old, unreferenced pages
- toss not referenced before referenced pages, and not modified
before modified pages
- hardware implementation - still need to clear reference bits
- software - interrupts and protection
- fifo - oldest out next; no reference or modify checking
- second chance - oldest unreferenced out
- clock page replacement - second chance or fifo as a ring
- The past as prolog.
- Page queue ordered by access time.
- LRU simulations.
- Not frequently used adds a read bit to a counter; doesn't forget.
- Aging shifts the read bit in from the right.
- LRU problems.
- There is no intra-interval granularity.
- Resolution is finite and small.
- working set model
- local vs. global page allocations
- page size
- the memory-management interface
working set model
- demand paging
- locality of reference
- working set
- using counters to compute the working set
- adding the working set to paging algorithms - wsclock
local vs. global page allocations
- Fractional vs. weighted total allocations.
- Managing by page-fault frequency.
- Swapping is still useful to handle global thrashing.
- Set by hardware or software - alpha 8, 16, 32, 64kb pages.
- Internal fragment, small segments argues for small pages.
- Small page tables, paging overhead argues for large pages.
- Page table overhead on context switches.
The Memory-Management Interface
- Sharing pages.
- Pinning pages.
- Mapping files.
- Distributed shared memory.
- Big programs use address outside the range of physical storage.
- Overlay relocation into regions solved the problem for overlays.
- A similar technique can solve the problem for virtual memory.
- Essentially relocate pages into page frames.
Relative Addressing and Relocation
- A program with relative addresses has offsets from a base address.
- Relocation involved adding a program base to all the offsets to produce
- Link-load time relocation is inflexible.
- Once a relative program is relocated to absolute addresses, it's
difficult or impossible to re-relocate again.
- Hardware can help retain relocation flexibility throughout a process's
- Keep the base address in a relocation register and add each offset
to the base address.
- The process can be relocated by moving it to new base address B'
and storing B' in the relocation register.
Relocating Virtual Addresses
- Treat each page as a mini-program.
- Addresses in the page are offsets from the start of the page.
- Assume each page frame has a relocation register.
- Virtual address can be relocated to absolute addresses.
The Virtual Relocation Calcuation
- The virtual-relocation calculation needs a base address and an offset.
- The structure of the virtual-address space gives us both these values.
- The base address requires a bit more work than does the offset.
Virtual Memory Structure
- The regular page structure imposed on virtual memory induces a
structure on virtual addresses.
- If a 2n-unit virtual-address space is divided into 2p
pages, p < n, how big is each page?
- Alternatively, if a 2n virtual-address space has 2s-byte
pages, s <= n, how many pages are there?
Virtual Address Structure
- An n-bit virtual address can be split into two parts:
- The high-order p-bit page index, and
- The low-order (n - p)-bit page offset, (n - p = s).
- For example, consider a 4-bit virtual-address split into four pages.
n = 4, p = 2, s = 4 - 2 = 2.
0000 0100 1000 1100
0001 0101 1001 1101
0010 0110 1010 1110
0011 0111 1011 1111
The Page-Page Frame Mapping
- Given the virtual address i*2s + f, which page frame
holds a page index i?
- Conceptually, we need an array of size 2p.
- Page i is in page frame A[i].
- If page frame A[i]) is empty, throw a page fault.
- This array is called the page table.
- The page table also contains other information for each page:
- Read-write-execute permissions.
- Access-modify status bits.
- Virtual-address relocation has to be done on every address issue.
- The memory-management unit (MMU) helps make sure va relocation is fast.
- It holds the page table and does the math.
- MMUs can be part of the CPU or off-CPU (but not too far away).
- Given their size, they're usually off-CPU.
Page Table Size
- Page tables have two unattractive properties:
- They're big.
- A 32-bit viratual address with 8k (= 213)-byte pages has
232 - 13 = 219 = 512k entries.
- A page-table entry is at least four bytes for a total of 2 mbytes.
- They're sparse.
- There are few gigabyte programs, even with data.
- Even if MMUs were big enough to hold a page table, it would be
Segmented Page Tables
- A page table can be broken up into segments.
- The MMU contains only the segments for code and data being accessed.
- Page faulting becomes more complex, as does indexing.
Hierarchal Page Tables
- Rather than a flat array of page-table segments, create a tree of
- This packs page-table segments densely in virtual memory.
- The tree depth is the page-table level, usually two or three.
The Translation Look-Aside Buffer
- The MMU can't be too fast, and caching is a good way to speed things
- The cache for the MMU is called the Translation Look-Aside Buffer
Inverted Page Tables
- Generally, systems will have less than full compliment of physical
- 32-bit systems rarely have 4 gig of physical storage.
- The number of page frames can be smaller than the number of pages in a
- This is not so true as it used to be with more cost-effective larger
- Rather than index the page table by page index, which can be large,
index it by page-frame number, which will usually be smaller.
- What happens when a page needs to be brought in from disk and there are
no open page frames?
- A resident page needs to be ejected to free up a page frame.
- The page-replacement algorithm determines which pages to eject.
Random Page Replacement
- Just pick a page frame at random and eject the resident page.
- Simple but not too efficient.
- It may pick a hot page that will have to be paged in again soon.
Optimal Page Replacement
- The best page to eject is the one that will remain untouched for the
- It won't have to be paged in until sometime far into the future.
- Always selecting the longest untouched page results in optimal
(minamal) paging behavior.
- Unfortunately, because it requires knowledge of the future, it's
impossible to implement.
- But maybe we could make a good guess about future behavior.
Least-Recently Used Replacement
- Estimate future behavior from past behavior.
- A page that hasn't been accessed probably won't be accessed.
- Eject the page that has been unaccessed the longest.
- This is known as Least Recently Used or (LRU) replacement.
- LRU replacement works well, but is expensive to implement.
- It requires bits per page frame to keep time and lots of comparisons.
Estimating LRU Replacement
- There are lots of LRU estimators using fewer clock bits (sometimes only
one) and comparisons.
- These are usually still expensive because the clock bits have to be
handled by OS software and not hardware.
- The queue is a simple and common LRU estimator.
- A page access brings the page to the head of the queue.
- Unaccessed pages sink to the end of the queue.
- A sampling rate (every nth access) reduces the need to requeue
on every access.
- Sampling may cause accessed pages to sink to the end of the queue.
- Assign each page a second-chance bit, initially 0.
- If the end-page second-chance bit is 1, eject it.
- If the end--page second-chance bit is 0, requeue it and set the
second-chance bit to 1.
- The second-chance bit is cleared whenever the pge is accessed.
- Rather than maintain a queue of pages, form them into a ring and use a
pointer to mark the head of the queue.
- A trailing second hand can serve as the second-chance bit scanner,
speeding up the rate at which pages are ejected.
- The angle between the head and second scanner is a parameter of the
- If a paged selected for ejection has been modified (is a dirty
page), it needs to be written out to disk to save the modifications.
- Unmodified pages don't need to be written out, they can just be
- Bias page-selection algorithms towards unmodified pages.
- Dirty pages cannot be written back to the files from which they came.
- There may be more than one copy of the program running, and so more
than one copy of the modified page.
- Dirty pages are written to a special file system called the paging
(or sometimes swapping) area.
- The paging area is a special-purpose file system designed to provide
fast page IO.
- Regular files can also be paging areas too, with decreased IO
- How many page frames should a process get?
- Local page-frame allocation partitions the page frames among
- Each process gets a subset of non-shared page frames.
- When each process uses fills its allotment of page frames, paging
- Global page-frame allocation makes all page frames available to
- Each process gloms on to as many page frames as it can.
- Paging begins when all page frames are gone.
- Local allocation is fair but potentially wasteful; poor allocation
decisions may lead to excessive paging or unused page frames.
- Global allocation acheives better utilization (fewer unused page
frames), but may cause a form of extreme paging known as thrashing.
Static vs. Dynamic Allocation
- Static page-frame allocation allocates a fixed and unchanging
number of pages at process start.
- Dynamic page-frame allocation allows the number of allocated page
frames to change with process behavior.
This page last modified on 19 November 2004.