Lecture Notes for SE 598, Data Structures and Algorithms

9 August 2000, Map and Multimap Applications


  1. file-system directories

    1. a file system is the place where the operating system stores data

    2. a file-system directory is a file containing information about the files being stored - location, size, name, owner, and so on

    3. directory look-up involves mapping a file name to the associated file-system information

    4. maps can be used to implement directories

      1. advantages - fast

      2. disadvantages - expansion

  2. fingerprinting

    1. a hash function h() maps an arbitrary value v into an integer h(v)

    2. a set of hash functions map an arbitrary value into a set of integers

    3. the set of integers { h1(v), h2(v), ..., hn(v) }can be thought of as a signature or a fingerprint identifying the value v

    4. different values should have different fingerprints

    5. values can be large (millions of bytes) , but fingerprints are small (10s or 100s of bytes)

    6. given a large set of large values, how can they quickly be searched

      1. form a large bit vector (100s to 1000s of bits)

      2. find a small fingerprint (10s of values) for each document

      3. use the values in each fingerprint to set bits in the bit vector

      4. to search for a value

        1. find the fingerprint of the value

        2. if any of the bits associated with the value's fingerprint are false, the value can't be in the set

        3. if all the bits are true, the value may or may not be in the set

    7. fingerprints have other uses too

      1. a modified value has a different fingerprint than the original value - check for tampering or change


This page last modified on 9 August 2000.