- Failures
- Failure Handling
- Abrupt Termination
- Continuation
- Rollback
- Roll-Forward
- Retry
- Handlers
- Cancellation
- Interrupts
- I-O
- Asynchronous Termination
- Multiphase Cancellation
- Our motto: "If something can go wrong, it will."
- But isn't interference supposed to prevent failures?
- Yes, but:
- Block only if thread B isn't blocked (undetectable).
- Unblock when thread B exits a CS (liveness).
- External, asynchronous signals (interrupts).
- The image has been edge-reduced (expensive and difficult).
- Threads may need mechanisms dealing with failure.
- There are a multiplicity of ways a computation can fail.
- And, consequently, many ways of dealing with failures.
- Six approaches to handling failures:
- Give up.
- Ignorance.
- Rollback.
- Roll-forward
- Retry.
- Hand-off to an expert.
- Fail hard, fail fast, and fail loud.
- There should be no ambiguity that a fault occurred (hard).
- The fault should be signaled as close to its source as possible
(fast).
- Present as much context as possible to aid analysis (loud).
- This is a valuable software engineering principle, although not
without controversy.
- Abrupt termination works best for purely internal failures.
- Running out of storage, for example.
- User-generated faults should not explode.
- Assertions are good to detect and trigger abrupt termination.
- Exceptions can be helpful too.
- maintaining sufficient context near the fault can be hard.
- Exceptions can be intercepted and ignored too.
- Database-style semantics: the record's either completely changed or
completely unchanged.
- No failure changes the record; failure leaves the record unchanged.
- Implemented by save-restore actions on the state being modified.
- Save the old state; restore it on failure, replace it on success.
- High-level state changes may need more machinery.
- Some changes can't be undone easily or at all.
- Undoing a log-file write.
- However, virtual I-O operations can provide undo-ability.
- Assuming infrequent conflicts, rollback recovery can replace
synchronization.
- Perform the operation without synchronization
- Check the result and commit the operation or roll back.
- A basic failure handling mechanism: hit the return key a few more
times.
- Helpful when failures are transient.
- Implementation details are crucial.
- Retry counts prevent runaway recovery during failures.
- Random backoff prevents resource hammering.
- Complex applications may provide their own recovery operations.
- Call the application, then perhaps, call the error handler.
- Handler behavior depends on global state, implementation details, and
so on.
- Callbacks provide another method for specifying fault handlers.
- But clients may find it difficult to handle faults.
- Applications can set up complicated handler arrangements.
- Cancellation is an organized form of failure.
- "Organized" because it can be predicted and provided for.
- Cancellation arises under many circumstances:
- User control (cancel buttons).
- Terminating infinite loops.
- Eliminating redundant computations.
- Individual failure in group activity.
- Thread interrupts are the principle Java cancellation method.
- Remember this is a convention.
- Interrupts don't force immediate termination.
- Don't forget to provide enough
interruption points.
- This helps trade-off responsiveness and control.
- Responses to an interrupt can include any of the previous fault
handling methods.
- Propagating interrupts requires resetting the interrupt flag.
- Don't forget that I-O and mutex waiting are not interruptible.
- I-O time-out is similar to interruption.
- Resource revocation is a similar technique.
- Group revocations need to be handled delicately.
- Cancelling in the middle of I-O requires difficult recovery.
- Hand craft time-out loops with interrupt checks.
-
stop()
has been deprecated; don't use it.
- Interrupting threads in an inconsistent state.
- Recovery is difficult because context is unclear and fine-grained.
- Handling asynchronous termination is hard, almost impossible.
- ThreadDeath exceptions can be caught and ignored.
- A single cancellation technique may not work.
- Try a sequence of them.
- The Unix
kill -hup ; kill -kill
strategy.
- Failures happen; your code should be prepared.
- Failure Handling
- Abrupt Termination
- Continuation
- Rollback
- Roll-Forward
- Retry
- Handlers
- Cancellation is a form of deliberate failure.
- Interrupts
- I-O
- Asynchronous Termination
- Multiphase Cancellation
This page last modified on 17 July 2003.