Hash Tables

What if |U| is large, and the set of Keys actually stored, K, is small ?

|U| >> |K|

Can build a hash table of size Q ( |K| ) and can reduce searching to O (1), on average. With a hash function h (k) we store x with k = key[x] in h (k). h: U ® K, since |U| >> |K| we expect collisions.

We say k hashes to slot h (k) and h (k) is the hash value of key k.

Chained-Hash-Insert ( T, x ), insert x at the head of list T[h(key[x])]  O (r),  r = number of elements in T [ h (k) ]

Chained-Hash-Search ( T, x ), search for an element with key k in list T[h(k)]  O (r)

Chained-Hash-Delete ( T, x ), delete x from the list T[h(key[x])] O (1)

Clearly, collisions are minimized if h is a "randomizing" function. Given T, a hash table, with m slots storing n elements.

a = n/m is the load factor ( often consider a fixed as n, m ® ¥ )

Worst-case behavior of hashing with chaining is Q ( n ) + time to compute h (k). This occurs when all n elements hash to a single slot.

Assume this doesn't happen: Simple Uniform Hashing: a given Key, k, is equally likely to hash to any of the m slots (values of h (k)).

Assume h (k) cand be computed in O (1). Thus searching for an element with key k is proportional to T[h(k)]'s length. We must analyze this (T[h(k)]'s length) it two cases: search unsuccessful, and search successful. (Assume simple uniform hashing.)

Unsuccessful: In T with collision resolution through chiaining, search time is Q ( 1+a ), in the average case.

Proof: Any key, k, is equally likely to hash into ANY of the m slots. Thus the time to unsuccessfully search is the time to search through one of the m lists. Average length = n/m = a. With h (k) computation costing O (1) plus O (a) for searching, we get O ( 1+a ).

Successful: A successful search in a hash table with chaining takes time Q ( 1+a ). Assume Chained-Hash-Insert adds new elements to the end of the list. The time to search for element with key k is one more (the successful element) than searched for when inserting that element. The length of the list to insert the ith element is ( i - 1 ) / m , so expected elements examined in successful search:

1 / n
n
å
i = 1 
 (1 + ( i - 1) / m )
=  1 + 1/ n m
n
å
i = 1 
 ( i - 1)
=  1 + 1/ n m
n - 1
å
i = 1 
 ( i )
=  1 + ( n ( n-1 )) /  2 n m
=  1 + n/ 2 m - 1 / 2 m
= 1 + a/2 - 1 / 2 m

Running time = 2 + a/2 - 1 / 2 m = Q ( 1+a )

Note: if a = n/m = O (1) , n » m, then all run (expected) in O (1) .

Why do you need to hash?

Discrete log problem (Diffie-Helman) Let a generate a group G, thus for any element e Î G

e = ai for some i

0 £ i < |G|

(An example is G = integers mod p, a is a primitive root mod p.)

Discrete log problem in G: Given a and e find i. A general solution to this problem is the giant-step / baby-step algorithm:


Building a Kangaroo trap for solving the discrete log problem.

Each giant-step see if you land in the "baby-step" region. Need O (Ö |G|)steps to do this, on average. This is the "Birthday Paradox". |G| = 264 Ö |G|) = 232, must hash.

Hash Tables - 3 of 5