Lecture Notes for Operating Systems

Distributed Operating Systems, 17 December 2001


  1. distributed computing

    1. a common computation carried out in several places

    2. motivations

      1. geographic dispersion - organizations and applications

      2. redundancy - for reliability

      3. modular system design - easy upgrades and repairs

      4. unlimited resource growth - for a suitably small interpretation of "unlimited"

    3. technological support

      1. computers - like leaves upon the ground

      2. networking - fast, cheap, and ubiquitous

      3. but, concepts and techniques are lagging far behind hardware - tcp/ip and rpc are the state of the art

    4. taxonomy - difficult and non-definitive

      1. concentrate on the two main components - computers and networks

      2. computers - autonomy, architecture, geography

      3. networks - bandwidth, geography, function

      4. other characteristics - single point of failure; partitioned operation

  2. distributed operating systems

    1. distributed over what - multiple, networked, independent computer systems

    2. not distributed dbs, like dns, or distributed applications, like atms

    3. like an operating system

      1. manage (distributed) resources

      2. provide a more comfortable environment than bare hardware

      3. location independence (transparency)

    4. not a distributed os - network operating systems

      1. each computer runs a separate os

      2. explicit distinction among machines

      3. explicit resource location

      4. fault tolerance in the large, not in the small

    5. challenges

      1. multiplicity of failure modes

      2. communications latency

      3. lack of global state

    6. distributed systems configurations

      1. clusters - Sun, DEC, MS, beowulf, ...

      2. networked workstations - Berkeley NOW, COW

      3. processor pool - plan 9

    distributed os services

    1. network file systems

      1. make disks available over the network

    2. process migration

      1. shift processes around to improve performance - load balance, communication costs

  3. design issues - communication, naming and protection, resource management, fault tolerance, services

    1. communication - bridging separate address spaces; message passing, distributed shared memory

      1. message passing

        1. send, receive - synchronous or asynchronous

        2. naming - knowing the sender and receiver

        3. remote procedure calls - client-server computing; corba

        4. rpc error semantics - at least once, at most once, exactly once

        5. argument marshaling - heterogeneous systems and differing data representations

      2. distributed shared memory (dsm) - one virtual address space

        1. consistency models

        2. consistency protocols - latency; delay

    2. naming

      1. internal and external names

      2. name servers

    3. resource management - the lack of global state is the killer here

      1. hierarchical management - good for anonymous resources

      2. processor scheduling - communication patterns; load balancing

      3. deadlock detection - a waits for a message from b waits for a message from c waits for a message from a; global snapshot algorithms

    4. fault tolerance - perseverance in the presence of failure

      1. "a distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable" - leslie lamport

      2. reliable and available

      3. on the plus side - look at all the redundant hardware and software

      4. on the minus side - look at all the possible failure modes

      5. low-level fault tolerance - redundancy; checkpoint and rollback

      6. high-level fault tolerance - atomic transactions

      7. fault-tolerant models - fail-stop processors, stable storage

    5. services - os operations provided by user-level applications (servers)

      1. file server - nfs, afs

      2. process (cycle) server - godzilla, condor

      3. time service - nntp

      4. authentication service - kerberos

      5. boot service - dhcp

  4. examples

    1. plan 9 from bell labs

      1. redo unix in the light of lots of powerful, cheap computers and networking

      2. features

        1. resource access via a hierarchical file system - processes, windowing system

        2. communication based on the 9P protocol - controls the file system

        3. each user constructs a personal global environment - mounting and unmounting resources

      3. architecture - clustered timesharing

        1. front end is essentially a display terminal; the global environment moves from terminal to terminal

        2. back end is the rest of the networked components

      4. communication

      5. naming - via the file hierarchy; mount, bind, unmount

      6. resource management

      7. fault tolerance

      8. services - look like file directories

        1. three level file server - primary buffers, disk storage, worm back-up

        2. window system - /dev/mouse, /dev/console, /dev/screen

    2. the linda tuple space - not a dos, but a coordination language

      1. distributed shared memory - the tuple space


This page last modified on 18 December 2001.