Concurrent Programming CS 598 Class Notes

Lecture Notes for Concurrent Programming

15 July 2003 - Failure and Cancellation

Failure and Cancellation

Failures
Failure Handling
- Abrupt Termination
- Continuation
- Rollback
- Roll-Forward
- Retry
- Handlers
Cancellation
- Interrupts
- I-O
- Asynchronous Termination
- Multiphase Cancellation

Failures

Our motto: "If something can go wrong, it will."
But isn't interference supposed to prevent failures?
Yes, but:
- Block only if thread B isn't blocked (undetectable).
- Unblock when thread B exits a CS (liveness).
- External, asynchronous signals (interrupts).
- The image has been edge-reduced (expensive and difficult).
Threads may need mechanisms dealing with failure.

Failure Handling

There are a multiplicity of ways a computation can fail.
- And, consequently, many ways of dealing with failures.
Six approaches to handling failures:
1. Give up.
2. Ignorance.
3. Rollback.
4. Roll-forward
5. Retry.
6. Hand-off to an expert.

Abrupt Termination

Fail hard, fail fast, and fail loud.
- There should be no ambiguity that a fault occurred (hard).
- The fault should be signaled as close to its source as possible (fast).
- Present as much context as possible to aid analysis (loud).
- This is a valuable software engineering principle, although not without controversy.
Abrupt termination works best for purely internal failures.
- Running out of storage, for example.
- User-generated faults should not explode.
Assertions are good to detect and trigger abrupt termination.
Exceptions can be helpful too.
- maintaining sufficient context near the fault can be hard.
- Exceptions can be intercepted and ignored too.

Continuation

When something fails, throw it away and continue.
- Works well for low level, unimportant failures.
  - Can't write a log file? Oh well.
- Works less well for high level, important failures.
  - Can't write your edits back to the file? Oh well.
A useful technique when there's lots of redundant resources.
- Lost a CPU? Oh well, we've got three more.
The faulty object may need to be marked for avoidance or later recovery.

Continue past cascading fault after the first.

try // whatever
catch (Exception e)
  // Handle current fault,
  // continuing past new faults.

Continuation Example

Try to open a log file.

interface LogFileInterface
  public void write(String)

class GoodLogFile
implements LogFileInterface
  GoodLogFile(DataStream ds)

class BadLogFile
implements LogFileInterface { }

main(String [] args)
  LogFileInterface log;
  DataStream ds = new DataStream("logfile")
  if ds.opened()
    log = new GoodLogFile(ds)
  else
    log = new BadLogFile()

  log.write("hello world!")

Rollback

Database-style semantics: the record's either completely changed or completely unchanged.
- No failure changes the record; failure leaves the record unchanged.
Implemented by save-restore actions on the state being modified.
- Save the old state; restore it on failure, replace it on success.
High-level state changes may need more machinery.
- Do-undo operations.
Some changes can't be undone easily or at all.
- Undoing a log-file write.
However, virtual I-O operations can provide undo-ability.
Assuming infrequent conflicts, rollback recovery can replace synchronization.
- Perform the operation without synchronization
- Check the result and commit the operation or roll back.

Rollback Example

Add items to a shared queue without (or with minimal) synchronization.
- Count the queue elements.
- Add the new element.
- Count the queue elements again.
- If there's one more, then commit; otherwise rollback.
```
// Invariant: q is consistent

put(Object o) 
  final int c = q.size()
  final Queue copy = q.clone()
  q.put(o)
  if (c + 1 != q.size())
    q = copy
```
Finding effective consistency checkers can be difficult.
- Ensuring progress can also be difficult.

Roll-Forward

Similar to continuance, but with damage control.
- Use roll-forward when the faulty object can't be ignored.
Helpful when there's useful damage control possible.
- Out of storage? Cut back on data structure size and try again.
```
if outOfStorage
  Tracing.logSize /= 2
```
May invoke a fault mode to prevent further potential faults.
```
if mode != SystemState.storageStarved
  Tracing.log();
```
Combining rollback and roll-forward is often useful.
- Roll-forward uses information saved by rollback.

Retry

A basic failure handling mechanism: hit the return key a few more times.
Helpful when failures are transient.
Implementation details are crucial.
- Retry counts prevent runaway recovery during failures.
- Random backoff prevents resource hammering.

Handlers

Complex applications may provide their own recovery operations.
Call the application, then perhaps, call the error handler.
Handler behavior depends on global state, implementation details, and so on.
Callbacks provide another method for specifying fault handlers.
- But clients may find it difficult to handle faults.
Applications can set up complicated handler arrangements.

Cancellation

Cancellation is an organized form of failure.
- "Organized" because it can be predicted and provided for.
Cancellation arises under many circumstances:
- User control (cancel buttons).
- Terminating infinite loops.
- Eliminating redundant computations.
- Individual failure in group activity.

Interrupts

Thread interrupts are the principle Java cancellation method.
- Remember this is a convention.
Interrupts don't force immediate termination.
- Don't forget to provide enough interruption points.
- This helps trade-off responsiveness and control.
Responses to an interrupt can include any of the previous fault handling methods.
Propagating interrupts requires resetting the interrupt flag.
Don't forget that I-O and mutex waiting are not interruptible.

I-O

I-O time-out is similar to interruption.
Resource revocation is a similar technique.
- Group revocations need to be handled delicately.
Cancelling in the middle of I-O requires difficult recovery.
Hand craft time-out loops with interrupt checks.

Asynchronous Termination

stop() has been deprecated; don't use it.
Interrupting threads in an inconsistent state.
Recovery is difficult because context is unclear and fine-grained.
Handling asynchronous termination is hard, almost impossible.
ThreadDeath exceptions can be caught and ignored.

Multiphase Cancellation

A single cancellation technique may not work.
- Try a sequence of them.
- The Unix kill -hup ; kill -kill strategy.

Points to Remember

Failures happen; your code should be prepared.
Failure Handling
Abrupt Termination
Continuation
Rollback
Roll-Forward
Retry
Handlers
Cancellation is a form of deliberate failure.
Interrupts
I-O
Asynchronous Termination
Multiphase Cancellation

This page last modified on 17 July 2003.