Lecture Notes for Simulation

4 April 2005 - Hypothesis Testing

Outline

Answering questions.
- Statistical Hypotheses
- Hypothesis Testing and Outliers
- Accepting vs. Rejecting
Hypotheses
Error types and controlling errors.
One- and two-tailed tests.

Answering Questions

Simulation provide numeric answers.
- A mean m within bounds b at confidence c%.
Simulations can also provide yes-no answers.
- "Is it worth making this change to the system?"
  - "Yes (or no) with s% significance."
- Yes-no answers tend to be of greater interest to decision makers.
Numeric and yes-no answers are essentially similar.
- The data and underlying solution methods are identical.
- The interpretations differ, as do the solution techniques
Yes-no answers use a statistical-analysis technique known as hypothesis testing.

Statistical Hypotheses

A statistical hypothesis (or just hypothesis) is a predicate over one or more populations.
- 50% of Monmouth students are between 60 and 72 inches tall.
- Server utilization is within 10% of 0.8.
As a predicate, a hypothesis can be considered true or false only when taken over the whole population.
As a practical matter, the truth or falsity of a hypothesis is determined by considering a sample from the related population.
- The results of the sample may not be relevant to the population.
- A statistical hypothesis does not buy improved accuracy.
  - However, you may be able to exploit the analysis to shift priorities.

Hypothesis Testing and Outliers

Hypothesis testing recognizes and counts outliers in the sample.
- Too many outliers is not a good sign.
Is a coin fair if it's flipped fifty times and
- it comes up heads twelve times?
- it comes up heads thirty times?
The probability of consistently picking outliers from a tailed population is low.
- A tailed population probably didn't produce a set of consistent outliers.

Relying on Outliers

Relying on outliers has two consequences:
- You can't be sure your outlier determination.
  - Fifty flips of a fair coin can produce only twelve heads.
  - But it's highly unlikely.
- If they're not outliers, you can't say what they are.
  - Fifty flips of non-fair coin can result in thirty heads.
  - And, because you don't know how unfair the coin is, you can't say how unlikely it is.

Accepting vs. Rejecting

A hypothesis may either be accepted or rejected.
However, because of sampling, accepting and rejecting are not complimentary.
- A rejected hypothesis can be considered false.
- An accepted hypothesis may or may not be true.
  - There's not enough evidence reject the hypothesis.
  - But the evidence to accept isn't conclusive either.

Test Set-Up

Draw n samples from a population with mean m and variance v².
Compute x; by the Central Limit Theorem, x is from an approximately normal distribution with mean m and variance v²/n.
The reject region is known as the critical area.
x_l and x_r are known as the critical values.
Accept the hypothesis if x_l <= x <= x_r, otherwise reject it.
The critical area and values are parameters to the test.

Hypotheses

Because rejecting's more definitive than accepting, the idea is to accept the hypothesis and try to reject it.
- This is the null hypothesis, denoted H₀.
- Given the set-up, the null hypothesis is an equality.
If the null hypothesis isn't rejected, it's not necessarily accepted.
- It's just not obviously wrong.
If the null hypothesis is rejected, we can accept the inverse hypothesis.
- This is the alternative hypothesis, denoted H_a.
- The alternative hypothesis is an inequality.
Defining the hypotheses can be tricky.

Example Hypotheses

Is the mean server utilization equal to 0.8?
- H₀: m = 0.8.
- H_a: m != 0.8.
- The interest is in not rejecting the null hypothesis.
Is the mean server utilization greater than than 0.8?
- H₀: m = 0.8.
- H_a: m > 0.8.
- The interest is in rejecting the null hypothesis.

Error Types

A hypothesis test can go wrong by
- rejecting a true null hypothesis or
- accepting a false null hypothesis.
A type I error or a false negative rejects a true null hypothesis.
A type II error or a false positive accepts a false null hypothesis.

Controlling Type I Errors

A type I error (false negative) occurs when a sample rejects H₀ when it's true.

This occurs when the sample mean exceeds the critical values.
The critical area is the probability of this happening.
- It's known as the significance level.

False negatives can be controlled by adjusting the critical region or values.
- Assuming the critical region or values aren't fixed by the problem.

Type I Error Examples

Is the mean server utilization within 10% of 0.8?
- The critical values are x_l = 0.8 - 0.08 = 0.72 and x_r = 0.88.
- Let v = 0.1; z = (x - m)/v = 0.08/0.1 = 0.8.
- From the normal table, P(z < 0.8) = 0.79.
- The critical region is (1 - 0.79)*2 = 0.42.
With a 1 in 10 chance of being wrong, is the mean server utilization 0.8?
- P(z_0.95 < z) = 1.65.
- 1.65 = (x - 0.8)/0.1; X = 1.65(0.1) + 0.8 = 0.965.
- x_r = 0.965; x_l = 0.8 - 0.165 = 0.635.

Controlling Type II Errors

A type II error (false positive) occurs when a sample does not reject a false null hypothesis.
False positives depend on the form of the alternative hypothesis.
The probability that x is a false positive is b = P_a(x < x_r).
The probabilities of type I and II errors are inversely related.
- And type I errors are directly controllable.
The false-positive probability decreases as the two distributions separate.

Type II Error Examples

Suppose the server utilization distribution actually has m = 0.9 and v = 0.01.
- The null hypothesis is m is within 10% of 0.8.
- The critical value is x_r = 0.88.
- The false-positive probability is b = P_a(x < 0.88).
- z_b = (x_r - m)/v = (0.88 - 0.9)/0.01 = -2
- From the normal table, P^a(z < -2) = 0.02.
- There's a 1 in 50 chance of a false positive on the high side.
- By symmetry, the chance on both sides is 1 in 25.

One- and Two-Tailed Tests

The null hypothesis specifies distribution parameter equality.
- Otherwise, false positives are uncontrollable.
The alternative hypothesis negates the null hypothesis.
- The negation of an equality is either an inequality or an ordering.
- Given H₀: p = p', H_a is either p != p', p < p', or p > p'.
An alternative hypothesis with inequality results in a two-tailed hypothesis test.
- The critical region is split symmetrically about the parameter under test.
An alternative hypothesis with ordering results in a one-tailed hypothesis test.
- The critical region is to the left (p < p') or right (p > p') of the parameter under test.

Test Tail Examples

The average wait time is 5 minutes.
- The null hypothesis is H₀: m = 5.
- The alternative hypothesis is H_a: m != 5.
- This is a two-tailed test: a x can falsify H₀ by being too large or too small.
The average wait time does not exceed 5 minutes.
- The null hypothesis is H₀: m = 5.
- The alternative hypothesis is H_a: m > 5.
- This is a one-tailed test: x can falsify H₀ only by being too large.

Unknown Population Variance

The variance of the hypothesis population may not (probably won't) be known.
Do the usual trick: substitute the sample variance for the population variance.
For 30 or more samples, the sample mean is still a standard normal distribution.
For n < 30 samples, the sample mean is a t-distribution with n - 1 degrees of freedom.

t-Distribution Example

A system runs with a 42 widget/min throughput.
A simulation of proposed system changes results in m = 50 w/m and v = 11.9 (n = 12). Should the changes be implemented?
- H₀: m = 42; H_a: m > 42.
- m_r is t-distributed with 11 degrees of freedom.
  - At 5% significance level, t_r = 1.796.
  - At 1% significance level, t_r = 2.718.
t = (m - m)/(v/n^0.5) = (50 - 42)/(11.9/3.5) = 2.35
2.35 is in the 5% critical region; reject H₀; m is greater than 42 (with a 1 in 20 chance of being wrong).
2.35 isn't in the 1% critical region; don't reject H₀; m isn't that much greater than 42 (with a 1 in 100 chance of being wrong).

Points to Remember

Hypothesis testing answers yes-no questions.
A hypothesis is a predicate about distributions.
- This is flexible, but not obviously so.
Errors occur as false negatives or positives (type I or II).
- Error control concentrates on false negatives.

This page last modified on 5 April 2005.