Lecture Notes for Operating Systems

Distributed Operating Systems, 14 December 1999


These are provisional lecture notes, expect changes.

  1. distributed operating systems

    1. distributed over what - multiple, networked, independent computer systems

    2. not distributed dbs, like dns, or distributed applications, like atms

    3. like an operating system

      1. manage (distributed) resources

      2. provide a more comfortable environment than bare hardware

      3. location independence (transparency)

    4. not a distributed os - network operating systems

      1. each computer runs a separate os

      2. explicit distinction among machines

      3. explicit resource location

      4. fault tolerance in the large, not in the small

    5. objectives

      1. exploit groups of cheap, powerful processors

      2. provide incremental growth

      3. fault tolerance in both the large and small

    6. challenges

      1. multiplicity of failure modes

      2. communications latency

      3. lack of global state

    7. distributed systems configurations

      1. clustered timesharing - VAXClusters

      2. networked workstations - Berkeley NOW, COW

      3. processor pool - plan 9

  2. design issues - communication, naming and protection, resource management, fault tolerance, services

    1. communication - bridging separate address spaces; message passing, distributed shared memory

      1. message passing

        1. send, receive - synchronous or asynchronous

        2. naming - knowing the sender and receiver

        3. remote procedure calls - client-server computing; corba

        4. rpc error semantics - at least once, at most once, exactly once

        5. argument marshaling - heterogeneous systems and differing data representations

      2. distributed shared memory (dsm) - one virtual address space

        1. consistency models

        2. consistency protocols - latency; delay

    2. naming

      1. internal and external names

      2. name servers

    3. resource management - the lack of global state is the killer here

      1. hierarchical management - good for anonymous resources

      2. processor scheduling - communication patterns; load balancing

      3. deadlock detection - a waits for a message from b waits for a message from c waits for a message from a; global snapshot algorithms

    4. fault tolerance - perseverance in the presence of failure

      1. "a distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable" - leslie lamport

      2. reliable and available

      3. on the plus side - look at all the redundant hardware and software

      4. on the minus side - look at all the possible failure modes

      5. low-level fault tolerance - redundancy; checkpoint and rollback

      6. high-level fault tolerance - atomic transactions

      7. fault-tolerant models - fail-stop processors, stable storage

    5. services - os operations provided by user-level applications (servers)

      1. file server - nfs, afs

      2. process (cycle) server - godzilla, condor

      3. time service - nntp

      4. authentication service - kerberos

      5. boot service - dhcp

  3. examples

    1. plan 9 from bell labs

      1. redo unix in the light of lots of powerful, cheap computers and networking

      2. features

        1. resource access via a hierarchical file system - processes, windowing system

        2. communication based on the 9P protocol - controls the file system

        3. each user constructs a personal global environment - mounting and unmounting resources

      3. architecture - clustered timesharing

        1. front end is essentially a display terminal; the global environment moves from terminal to terminal

        2. back end is the rest of the networked components

      4. communication

      5. naming - via the file hierarchy; mount, bind, unmount

      6. resource management

      7. fault tolerance

      8. services - look like file directories

        1. three level file server - primary buffers, disk storage, worm back-up

        2. window system - /dev/mouse, /dev/console, /dev/screen

    2. the linda tuple space - not a dos, but a coordination language

      1. distrubuted shared memory - the tuple space


This page last modified on 22 December 1999.