Lecture notes for CS 306, Computer Algorithms II

Computer Algorithms II Lecture Notes

3 April 2007 • Hashing Basics

Easy after a small amount of thought. Suppose p[] is an n-element identity permutation (that is, p[i] = i for 0 ≤ i < n). There are at least two ways to generate a random permutation in p:

for i = 0 to n - 1
  swap(p[i], p[random(n)])

and

for i = 0 to n - 1
  swap(p[i], p[i + random(n - i)])

where random(n) returns at random one of the n integers 0 to n - 1. The first for loop, because each element in p can be swapped with n values, generates nⁿ possible permutations. The second for loop, because element i can be swapped with n - i values, generates n! permutations.

There are n! possible permutations of n values. The second for loop potentially generates all possible permutations with equal probability (1/n!). The first for loop generates far more potential permutations than there are possible ones, making it less clear that all possible permutations are equally likely. In addition, statistical analysis of the permutations generated by the first for loop shows, surprisingly, that the generated permutations tend to clump together over the range of possible permutations and are not uniformly distributed. Of the two loops, the second loop is the more effective random-permutation generator, both conceptually and emperically.

The article Randomized Shuffling by Tim Rolfe (Dr Dobb's Journal, January, 2000) contains a detailed comparison of these two loops, including graphs showing statistical properties of the generated permutations.

This page last modified on 24 January 2006.