Open Addressing

Store all elements in T without chaining for collision resolution. Instead use empty spaces in T. Consequences:

a ( load factor ) can never be bigger than one!
Must deterministically search for new spaces when there is a collision

Collision resolution is handled by probing the hash table (with m slots). Augment h:

U x {0, ..., m-1} ® {0, ..., m-1}

We probe < h (k, 0), h ( k, 1), ..., h ( k, m-1)> until a slot/element is found. Note < h (k, 0), h ( k, 1), ..., h ( k, m-1)> must be a permutation of {0,...,m-1} for each k. (must be able to get everywhere from everywhere)

Hash-Insert (T, k)

i ¬ 0

repeat j ¬ h (k, i)

   if T[j] = NIL

      then T[j] ¬ k

         return j

    else i ¬ i + 1

until i = m

error "hash table overflow"

Hash-Search (T, k)

i ¬ 0

repeat j ¬ h (k, i)

   if T[j] = k

      then return j

   i ¬ i + 1

until T[j] = NIL or i = m

return NIL

Use NIL / end of table as marker to stop search

Problem: When deleting, cannot leave a NIL behind or the elements afterward, with key k would become inaccessible. Solution: don't delete with open addressing!

Methods of probing

Linear Probing:

h (k, i) = ( h' (k) + i ) (mod m)

k ® h' (k) ® | _¯0
   ® h' (k) ® | _¯1
   ® h' (k) ® | _¯2
   ® h' (k) ® | _¯3

this eventually wraps.

One problem: primary clustering. One develops long strings of occupied spaces in T. Note: h ( k, i) = ( h' (k) + c i ) (mod m) does not help primary clustering: Clusters develop from many different keys initially hashing close together. This still has constant jumps.

What about: h ( k, i ) = h' (k), i = 0

                                 = h ' (k) + aⁱ ) ( mod m ) otherwise

jumps will no longer be the same size, thus defeating primary clustering. Question: What are good choices of a? a's with the property that <a⁰ ,a¹ ,a² (mod m),a³ (mod m), ... a^m-1 (mod m)> is a permutation of <0,1,2, ..., m-1>. Such an a is called a primitive root modulo m.

Quadratic Probing:

h ( k, i ) = ( h' (k) + c₁ i + c₂ i²) ( mod m )

To get the "full period" c₁, c₂, m are constrained. This does not have the same jump each time, and so it does not have primary clustering.
If h (k₁, 0) = h (k₂, 0), then h (k₁, i) = h (k₂, i) and they mirror one another. This leads to secondary clustering.
Note: h ( k, i ) = ( h' (k) + aⁱ) (mod m ) would also suffer from secondary clustering.

Double Hashing:

h (k, i) = ( h₁ (k) + i h₂ (k) ) ( mod m )

The jump depends on k as well. To get the "full period" we must have gcd ( m, h₂ (k) ) = 1.
If gcd ( m, h₂ (k) ) = d then after m/d steps you would wrap!! This means you could only access 1/d of T.

Choices:

h₁ (k) arbitrary "hash"

m = 2^p h₂ (k) always produces odd numbers ( gcd ( 2^p, odd ) = 1 )

h₁ (k) arbitrary "hash"
m = prime h₂ (k) return an integer modulo m

Example: h₁ (k) = k ( mod m) , h₂ (k) = 1 + ( k mod m' )
Note: There are m! different permutations you can get.
Linear and quadratic probing give you just one ( neglecting h' (k) ).
Double hashing gives you m more for total Q ( m ) possible permutations. This leads double hashing to giving close to SUH performance.

Analysis of Open-Address Hashing:

a = n/m < 1 is our figure of merit. Analyze performance as n, m ® ¥, a constant.
Assumption: the probe sequence < h (k, 0), h ( k, 1), ..., h ( k, m-1)> is a permutation of <0,...,m-1> for each k. Assume it is equally likely to be any of the m! permutations.
Theorem: With open-address hashing with a = n/m < 1 the expected number of probes in an unsuccessful search is at most 1/ (1 - a) > 1 .
Proof: When unsuccessful. each probe accesses a full slot except the last.
Let p_i = Prob[ exactly i probes access occupied slots.]
Expected number of probes:

1 +

¥
å
i = 0 i p_i

to find the empty slot.
Let q_i = Prob[ at least i probes access occupied slots.]

¥
å
i = 0 i p_{i
=}

¥
å
i = 1 q_i

q₁= n/m = a

q₂= ( n/m) ( (n-1)/(m-1) )

q_i= ( n/m) ( (n-1)/(m-1) ) ... ( (n - i + 1)/(m - i + 1) )

n/m > ? < (n-1) / (m-1)

n ( m-1 ) = n m - n

m ( n - 1 ) = n m - m, n < m

n ( m - 1 ) > m ( n - 1 )

n/m > ( n - 1 ) / ( m - 1 ), so

q_i= ( n/m) ( (n-1)/(m-1) ) ... ( (n - i + 1)/(m - i + 1) ) < ( n/m)ⁱ = aⁱ

1 +

¥
å
i = 0 i p_i      = 1 +

¥
å
i = 1 q_i      = 1 +

¥
å
i = 1 aⁱ

= 1 + a + a² ... = 1 / ( 1 - a ) , a < 1.

Corollary: Inserting an element into an open-address table with a requires at most 1 / ( 1 - a ) probes.
Proof: Insertion requires, an unsuccessful search then an insertion.
Theorem: Given an open-address T with a < 1, the expected number of probes in a successful search is: 1/a ln 1 / ( 1 - a ) + 1 / a .
Proof: Searching for k follows the same probe sequence as inserting it.
If k is the ( i + 1 )^st key inserted into T then:

1 / ( 1 - i/m ) = m / ( m- i )

is the expected number of probes for the search. Averaging over all keys:

1/n

n-1
å
i = 0 m / ( m- i ) = m/n

n-1
å
i = 0 1 / ( m - i )

= 1/a ( 1/m + 1 / ( m-1 ) + ... + 1 / ( m - n + 1 ))

= 1/a ( H_m   - H_m-n )

H_i=

i
å
j = 1 1/j, i^th Harmonic number.

Recall ln i £ H_i £ ln i + 1
1/a ( H_m   - H_m-n ) £ 1/a ( ln (m) + 1   - ln ( m-n ) )

= 1/a ln m / ( m-n ) + 1/a

m / ( m-n ) = (m/n) / ( m/m - n/m ) = 1 / ( 1 - a )

= 1/a ln 1/ ( 1 - a ) + 1/a

Example:

a = 0.5       1/2 ln 2 + 1/2 » 3.387

a = 0.9       10/9 ln 10 + 10/9 » 3.670

a = 0.99     100/99 ln 100 + 100/99 » 5.662

a = 0.999    1000/999 ln 1000 + 1000/999 » 7.915

a » 1 ® ln 1 / ( 1-a ) + 1

Hash Tables - 5 of 5