bool table[N] = { false } void add(k) table[h(k)] = true bool find(k) return table[h(k)] unsigned hash(k) return ( /* whatever */ ) % N
/usr/share/dict/american-english
has
98,568 words in 931,467 bytes.
bzip2
compresses this down to
333,414
bytes or 2.6×106 bits.
american-english
as the naughty-word list
(≈98,500 words).
american-english
(are not naughty).
american-english
(are naughty).
american-english
in a Bloom filter with around 800,000
buckets.
load | false | ||||
---|---|---|---|---|---|
buckets | factor | positives | |||
1,300,000 | 0.07 | 317 | 8% | ||
1,200,000 | 0.07 | 295 | 7% | ||
1,100,000 | 0.09 | 337 | 8% | ||
1,000,000 | 0.09 | 368 | 9% | ||
900,000 | 0.10 | 393 | 10% | ||
800,000 | 0.12 | 439 | 11% | ||
700,000 | 0.13 | 538 | 14% | ||
600,000 | 0.15 | 571 | 15% | ||
500,000 | 0.17 | 669 | 17% |
|
|
|
|
void add(w) table[h1(w)] = ... = table[hk(w)] = true bool find(k) return table[h1(w)] ∧ ... ∧ table[hk(w)]
load | false | ||||
---|---|---|---|---|---|
order | factor | positives | |||
1 | 0.12 | 439 | 11% | ||
2 | 0.21 | 162 | 4% | ||
3 | 0.31 | 98 | 3% | ||
4 | 0.38 | 82 | 2% | ||
5 | 0.45 | 79 | 2% |
h()
:
h()
’s range (the key space) into pieces.
This page last modified on 2010 May 22. |