next up previous
Next: About this document ... Up: A Birthday Paradox Extravaganza Previous: Hash collisions

Hashes of good messages, and bad messages

Consider computing two tables of hashes, one of ``good messages'' and one of ``bad messages.'' When is the probability that the second table will have a value that also appears in the first?

This problem is actually easier to do then the original one. Intuitively, there will be a collision much before the two tables are full. So we can assume that we perform $k$ hash computations for the first table (where $k \ll t$). Some of these values will be collisions, (because of the birthday problem, above) but not too many (because $k \ll $t), so we can assume that the number of DIFFERENT hash values in the first table is also approximately $k$.

For each value we compute in the second table, we have a probability $(t-k)/t$ that it does not coincide with any value in the first table. So, the possibility that NO value in the second table coincides with any value in the first table is $[(t-k)/t]^k$ (the second table also is constructed from $k$ hash computations). The probability of collisions between tables is therefore:

\begin{displaymath}1 - ((t-k)/t)^k.\end{displaymath}

If we want this probability to equal

\begin{displaymath}(e-1)/e ~=~ 63\%\end{displaymath}

, then we can set:
$\displaystyle 1 - ((t-k)/t)^k$ $\textstyle =$ $\displaystyle (e-1)/e,$ (7)
$\displaystyle e - e((t-k)/t)^k$ $\textstyle =$ $\displaystyle e - 1,$ (8)
$\displaystyle ((t-k)/t)^k$ $\textstyle =$ $\displaystyle 1/e.$ (9)

Computing logarithms, we get:
\begin{displaymath}k \log((t-k)/t) = -1.
\end{displaymath} (10)

Well, $(t-k)/t = 1 - k/t$ and $\log(1 - k/t) \simeq -k/t$ if $k \ll t$. Substituting the above approximation in Equation 10, we get that $k (-k/t) = - 1$, or $-k^2/t = 1$, so again $k^2 = t$ is the answer. Since each table has size $k$, we compute $2\sqrt{t}$ hashes, twice more than before.






Take your math with coffee...


next up previous
Next: About this document ... Up: A Birthday Paradox Extravaganza Previous: Hash collisions
Breno deMedeiros 2005-09-29