Lecture Notes for Simulation

Outline

There are four ways of characterizing the data required by a model:
1. Fit a distribution to system measurements.
2. Sample a histogram of system measurements.
3. Guess a distribution based on system characteristics.
4. Use the system measurements as trace data.

Given some sampled data, find a distribution from which it might have come.
- Then use the distribution to generate more data.
The procedure is
1. Create a frequency histogram from the sample data.
2. Pick a distribution close to the frequency histogram.
3. Calculate the mean and variance of the sample data.
4. Calculate the mean and variance of the distribution.
5. Compare goodness-of-fit between the data and distribution statistics.

Sometimes only the histograms of system measurements are available.
There are two approaches in this case:
- Start from step 2 in distribution fitting.
- Use the histogram to derive an empirical distribution.

Sometimes no measurements are available, or possible.
Previous experience, analogous systems, or theoretical results can provide intuition as to what to do.
In the worst case, use distributions that match the characteristics of the data of interest.
- Model component lifetimes with the Weibull distribution.
- Model retransmissions before success with the negative binomial distribution.
- Model service times with the Erlang distribution.
- Model random proportions with the beta distribution.
But you still need parameters for the distributions.

Use prior data sets.
- Directly as inputs.
- Indirectly via random sampling.
Problems:
- Simulation requirements may outstrip the available data.
- Biased, dirty, skewed, or otherwise compromised data.
- Limited ability to vary input data.
- Predictions are difficult to make without detailed knowledge of data characteristics.

An empirical distribution is a distribution created from a specific data set.
General approach:
1. Create frequency counts for the data.
2. Form the cumulative relative frequencies of the data.
3. Generate samples from the cumulative relative frequencies.

The cumulative relative frequencies indicate the growth of value counts in a range of values.
Given a set of frequency counts { f₁, ..., f_n }, form the cumulative relative frequencies by
- Finding the normalization value S = sum(i = 1 to n, f_i).
- Form the cumulative relative frequency c_i = sum(j = 1 to i, f_j)/S.
  - Note c_n = 1.

This page last modified on 24 February 2005.