Lecture Notes for SE 598, Data Structures and Algorithms
9 August 2000, Map and Multimap Applications
- file-system directories
- a file system is the place where the operating system stores data
- a file-system directory is a file containing information about the
files being stored - location, size, name, owner, and so on
- directory look-up involves mapping a file name to the associated
file-system information
- maps can be used to implement directories
- advantages - fast
- disadvantages - expansion
- fingerprinting
- a hash function h() maps an arbitrary value v into an integer
h(v)
- a set of hash functions map an arbitrary value into a set of integers
- the set of integers { h1(v), h2(v), ..., hn(v) }can be thought of
as a signature or a fingerprint identifying the value v
- different values should have different fingerprints
- values can be large (millions of bytes) , but fingerprints are small
(10s or 100s of bytes)
- given a large set of large values, how can they quickly be searched
- form a large bit vector (100s to 1000s of bits)
- find a small fingerprint (10s of values) for each document
- use the values in each fingerprint to set bits in the bit vector
- to search for a value
- find the fingerprint of the value
- if any of the bits associated with the value's fingerprint are
false, the value can't be in the set
- if all the bits are true, the value may or may not be in the set
- fingerprints have other uses too
- a modified value has a different fingerprint than the original value -
check for tampering or change
This page last modified on 9 August 2000.