Lecture Notes for Operating Systems
Distributed Operating Systems, 16 December 2002
- distributed computing
- a common computation carried out in several places
- motivations
- geographic dispersion - organizations and applications
- redundancy - for reliability
- modular system design - easy upgrades and repairs
- unlimited resource growth - for a suitably small interpretation of
"unlimited"
- technological support
- computers - like leaves upon the ground
- networking - fast, cheap, and ubiquitous
- but, concepts and techniques are lagging far behind hardware - tcp/ip
and rpc are the state of the art
- taxonomy - difficult and non-definitive
- concentrate on the two main components - computers and networks
- computers - autonomy, architecture, geography
- networks - bandwidth, geography, function
- other characteristics - single point of failure; partitioned
operation
- distributed operating systems
- distributed over what - multiple, networked, independent computer
systems
- not distributed dbs, like dns, or distributed applications, like atms
- like an operating system
- manage (distributed) resources
- provide a more comfortable environment than bare hardware
- location independence (transparency)
- not a distributed os - network operating systems
- each computer runs a separate os
- explicit distinction among machines
- explicit resource location
- fault tolerance in the large, not in the small
- challenges
- multiplicity of failure modes
- communications latency
- lack of global state
- distributed systems configurations
- clusters - Sun, DEC, MS, beowulf, ...
- networked workstations - Berkeley NOW, COW
- processor pool - plan 9
distributed os services
- network file systems
- make disks available over the network
- process migration
- shift processes around to improve performance - load balance,
communication costs
- design issues - communication, naming and protection, resource
management, fault tolerance, services
- communication - bridging separate address spaces; message passing,
distributed shared memory
- message passing
- send, receive - synchronous or asynchronous
- naming - knowing the sender and receiver
- remote procedure calls - client-server computing; corba
- rpc error semantics - at least once, at most once, exactly once
- argument marshaling - heterogeneous systems and differing data
representations
- distributed shared memory (dsm) - one virtual address space
- consistency models
- consistency protocols - latency; delay
- naming
- internal and external names
- name servers
- resource management - the lack of global state is the killer here
- hierarchical management - good for anonymous resources
- processor scheduling - communication patterns; load balancing
- deadlock detection - a waits for a message from b waits for a
message from c waits for a message from a; global snapshot algorithms
- fault tolerance - perseverance in the presence of failure
- "a distributed system is one in which the failure of a computer
you didn't even know existed can render your own computer unusable" -
leslie lamport
- reliable and available
- on the plus side - look at all the redundant hardware and software
- on the minus side - look at all the possible failure modes
- low-level fault tolerance - redundancy; checkpoint and rollback
- high-level fault tolerance - atomic transactions
- fault-tolerant models - fail-stop processors, stable storage
- services - os operations provided by user-level applications (servers)
- file server - nfs, afs
- process (cycle) server - godzilla, condor
- time service - nntp
- authentication service - kerberos
- boot service - dhcp
- examples
- plan 9 from bell labs
- redo unix in the light of lots of powerful, cheap computers and
networking
- features
- resource access via a hierarchical file system - processes,
windowing system
- communication based on the 9P protocol - controls the file system
- each user constructs a personal global environment - mounting and
unmounting resources
- architecture - clustered timesharing
- front end is essentially a display terminal; the global
environment moves from terminal to terminal
- back end is the rest of the networked components
- communication
- naming - via the file hierarchy; mount, bind, unmount
- resource management
- fault tolerance
- services - look like file directories
- three level file server - primary buffers, disk storage, worm
back-up
- window system - /dev/mouse, /dev/console, /dev/screen
- the linda tuple space - not a dos, but a coordination language
- distributed shared memory - the tuple space
This page last modified on 17 December 2002.