Open Addressing

Store all elements in T without chaining for collision resolution. Instead use empty spaces in T. Consequences:

  1. a ( load factor ) can never be bigger than one!

  2. Must deterministically search for new spaces when there is a collision

Collision resolution is handled by probing the hash table (with m slots). Augment h:

U x {0, ..., m-1} ® {0, ..., m-1}

We probe < h (k, 0), h ( k, 1), ..., h ( k, m-1)> until a slot/element is found. Note < h (k, 0), h ( k, 1), ..., h ( k, m-1)> must be a permutation of {0,...,m-1} for each k. (must be able to get everywhere from everywhere)

Hash-Insert (T, k)

  1. i ¬ 0

  2. repeat j ¬ h (k, i)

  3.    if T[j] = NIL

  4.       then T[j] ¬ k

  5.          return j

  6.     else i ¬ i + 1

  7. until i = m

  8. error "hash table overflow"

Hash-Search (T, k)

  1. i ¬ 0

  2. repeat j ¬ h (k, i)

  3.    if T[j] = k

  4.       then return j

  5.    i ¬ i + 1

  6. until T[j] = NIL or i = m

  7. return NIL

Use NIL / end of table as marker to stop search

 Problem: When deleting, cannot leave a NIL behind or the elements afterward, with key k would become inaccessible. Solution: don't delete with open addressing!

Methods of probing

Linear Probing:

h (k, i) = ( h' (k) + i ) (mod m)

k ® h' (k) ® | ¯ 0
   ® h' (k) ® | ¯ 1
   ® h' (k) ® | ¯ 2
   ® h' (k) ® | ¯ 3

this eventually wraps.

One problem: primary clustering. One develops long strings of occupied spaces in T. Note: h ( k, i) = ( h' (k) + c i ) (mod m) does not help primary clustering: Clusters develop from many different keys initially hashing close together. This still has constant jumps.

What about: h ( k, i ) = h' (k), i = 0

                                 = h ' (k) + ai ) ( mod m ) otherwise

jumps will no longer be the same size, thus defeating primary clustering. Question:  What are good choices of a? a's with the property that <a0 ,a1 ,a2 (mod m),a3 (mod m), ... am-1 (mod m)> is a permutation of <0,1,2, ..., m-1>. Such an a is called a primitive root modulo m.

Quadratic Probing:

h ( k, i ) = ( h' (k) + c1 i  + c2 i2)  ( mod m )

To get the "full period" c1, c2, m are constrained. This does not have the same jump each time, and so it does not have primary clustering.

If h (k1, 0) = h (k2, 0), then h (k1, i) = h (k2, i) and they mirror one another. This leads to secondary clustering.

Note: h ( k, i ) = ( h' (k) + ai ) (mod m ) would also suffer from secondary clustering.

Double Hashing:

h (k, i) = ( h1 (k) + i h2 (k) ) ( mod m )

The jump depends on k as well. To get the "full period" we must have gcd ( m, h2 (k) ) = 1.

If gcd ( m, h2 (k) ) = d then after m/d steps you would wrap!! This means you could only access 1/d of T.

Choices:

  1. h1 (k) arbitrary "hash"

  2. m = 2p h2 (k) always produces odd numbers ( gcd ( 2p, odd ) = 1 )

    1. h1 (k) arbitrary "hash"
    2. m =  prime h2 (k) return an integer modulo m

Example: h1 (k) = k ( mod m) , h2 (k) = 1 + ( k mod m' )

Note: There are m! different permutations you can get.

Linear and quadratic probing give you just one ( neglecting h' (k) ).

Double hashing gives you m more for total Q ( m ) possible permutations. This leads double hashing to giving close to SUH performance.

Analysis of Open-Address Hashing:

a = n/m < 1 is our figure of merit. Analyze performance as n, m ® ¥, a constant.

Assumption: the probe sequence  < h (k, 0), h ( k, 1), ..., h ( k, m-1)> is a permutation of <0,...,m-1> for each k. Assume it is equally likely to be any of the m! permutations.

Theorem: With open-address hashing with a = n/m < 1  the expected number of probes in an unsuccessful search is at most 1/ (1 - a) > 1 .

Proof: When unsuccessful. each probe accesses a full slot except the last.

Let pi = Prob[ exactly i probes access occupied slots.]

Expected number of probes:

1 +
¥
å
i = 0 
i pi
to find the empty slot.

Let qi = Prob[ at least i probes access occupied slots.]

¥
å
i = 0 
i pi      =
¥
å
i = 1 
qi

q1 = n/m = a

q2 = ( n/m) ( (n-1)/(m-1) )

qi = ( n/m) ( (n-1)/(m-1) ) ... ( (n - i + 1)/(m - i + 1) )

n/m > ? < (n-1) / (m-1)

n ( m-1 ) = n m - n

m ( n - 1 ) = n m - m,  n < m

n ( m - 1 ) > m ( n - 1 )

n/m > ( n - 1 ) / ( m - 1 ),  so

 qi = ( n/m) ( (n-1)/(m-1) ) ... ( (n - i + 1)/(m - i + 1) ) < ( n/m)i = ai

1 +
¥
å
i = 0 
i pi      =  1 +
¥
å
i = 1 
qi      = 1 +
¥
å
i = 1 
ai

=   1 + a + a2 ...  =   1 / ( 1 - a ) , a < 1.

Corollary: Inserting an element into an open-address table with a requires at most 1 / ( 1 - a ) probes.

Proof: Insertion requires, an unsuccessful search then an insertion.

Theorem: Given an open-address T with a < 1, the expected number of probes in a successful search is: 1/a ln 1 / ( 1 -  a ) + 1 / a .

Proof: Searching for k follows the same probe sequence as inserting it.

If k is the ( i + 1 )st key inserted into T then:

1 / ( 1 - i/m ) = m / ( m- i )

is the expected number of probes for the search. Averaging over all keys:

1/n
n-1
å
i = 0 
 m / ( m- i ) =  m/n
n-1
å
i = 0 
1 / ( m - i )

= 1/a ( 1/m + 1 / ( m-1 ) + ... + 1 / ( m - n + 1 ))

= 1/a ( Hm   - Hm-n  )

Hi =
i
å
j = 1 
 1/j, ith Harmonic number.

Recall ln i £ Hi £ ln i + 1

 1/a ( Hm   - Hm-n  ) £ 1/a ( ln (m) + 1   - ln ( m-n )  )

= 1/a ln m / ( m-n ) + 1/a

m / ( m-n ) = (m/n) / ( m/m - n/m ) = 1 / ( 1 - a )

= 1/a ln 1/ ( 1 - a ) + 1/a

Example:

a = 0.5       1/2 ln 2 + 1/2 » 3.387

a = 0.9       10/9 ln 10 + 10/9 » 3.670

a = 0.99     100/99 ln 100 + 100/99 » 5.662

a = 0.999    1000/999 ln 1000 + 1000/999 » 7.915

a » 1 ® ln 1 / ( 1-a )  + 1

Hash Tables - 5 of 5