- parallel, concurrent, distributed computing -
- distributed computing - independent computations coordinated to solve a
problem
- independent - no resource sharing; limited communications
- computation - processes, threads
- coordinated - communications, either data or control or both
- why do it
- because we can - cheap hardware and widespread communication at low
cost
- increased reliability - replication, redundancy
- increased autonomy - local control over local resources
- increased performance - unlimited resources, parallelism
- high modularity - upgrades and maintenance
- why not do it
- it's hard - most of the features above are wishful thinking
- failure modes - many and complex; can easily drive programs into
unexpected failures
- variable and unpredictable environment
- detecting problems - down or slow
- compensating for problems - partitioning
- recovering from problems - merging independent executions
- conceptually under-nourished
- rpc is an early 80s idea
- language support is minimal - type safety, versioning
- os support is minimal - coordination toolkits
- performance - latency and load balancing
- unhelpful standards
- true standards are low level - tcp/ip
- higher level standards are quasi and, um, pragmatic - corba, xml-soap
- aren't that high level, either - location, rpc
- distributed-computing patterns
- graph representations - nodes as computation; arcs as communication;
n-to-m patterns
- master-slave - simple communications patterns; one-to-many
- peer-peer - highly distributed; many-to-many
- client-server - conceptually simple; one-to-one
- publish-subscribe - simple and dynamic; many-to-many
- distributed computing examples
- scientific computation - seti@home, beowulf
- many cpus, massive parallelism
- data-access computations - dns
- improved reliability, performance
- monitoring and coordination - factory automation
- sensors, actuators, brains
This page last modified on 20 January 2003.