Version 09/03/18

Binary Trees and Quicksort

0 Permutations and Combinations

We review briefly in order to establish notation and basic identities. A permutation on n symbols is any specific ordering of the symbols. When digging deeper into the theory of permutations, the symbols are usually taken to be the first n positive integers, but clearly exactly which symbols are used is not material. It is often convenient in illustrations, particularly when n ≤ 26, to use letters rather than numbers. Denote the number of permutations of n symbols by Perm(n).

The number of permutations on n symbols is

(1)	Perm(n) = n×(n-1)×(n-2)×...×2×1 = n!

where as usual ! denotes the factorial of the number.

Proof. Since there is only one way to order a single symbol, Perm(1) = 1 = 1!. If n > 1, an n-permutation can be constructed in two steps: first order n-1 of the symbols and then place the last symbol somewhere in the ordering. We can assume as an inductive hypothesis that there are Perm(n-1) = (n - 1)! ways to accomplish the first step. To accomplish the second step, one position among the n available positions must be used for the last symbol. There are n ways to make this choice. Therefore there are Perm(n-1) × n = n! ways to accomplish both steps, completing the proof by mathematical induction.

The notion of permutation can be generalized slightly: A k-permutation on n symbols is any specific ordering of k symbols from a set of n symbols. Denote the number of k-permutations of n symbols by Perm(n,k).

The number of k-permutations on n symbols is

(2)	Perm(n,k) = n×(n-1)×(n-2)×...×(n-k+1) = n!/(n-k)!

The proof is similar to that for (1).

A combination of k items from a set of n items is a a subset, without regard to order. Denote the number of k-combinations of n symbols by Comb(n,k).

The number of k-combinations of n symbols is

(3)	Comb(n,k) = n!/k!(n-k)!

Proof. One of many ways to understand this result uses Eq (2) to count the k-permutations and then divide by the number of ways the k choices can be re-arranged (permuted), all of which represent one combination.

The numbers Comb(n,k) are also called the binomial coefficients due to this fact:

(4)	(x + y)ⁿ = ∑_{k=0..n}Comb(n,k)x^n-ky^k = xⁿ + nx^n-1y + Comb(n,2)x^n-2y² + ... + Comb(n,n-2)x²y^n-2 + nxy^n-1 + yⁿ

Setting x = y = 1 in the above yields

(5)	∑_{k=0..n}Comb(n,k) = 2ⁿ

That is, the sum of the coefficients is 2ⁿ. Other useful identities are:

(6)	Comb(n,k) = Comb(n,n-k)

(7)	Comb(n,k) = Comb(n-1,k-1) + Comb(n-1,k)

(8)	Comb(n,k) = Comb(n-1,k-1) × n / k

Identities (6) and (8) combine to make a reasonably efficient practical way to calculate while keeping overflow down. (3) is useful for understanding and theory, but terrible as a formula for direct computation. (7) is the basis for "Pascal's Triangle" and useful whenever all values are needed. In identity (8), it is assumed that multpilication by n occurs before division by k so that the computation remains in "integer world".

The binomial coeffients are fundamental to much of combinatorics, discrete math, and analysis of algorithms. The Wikipedia entry has much more on the topic.

1 Counting Binary Tree Shapes

For a given number of nodes, how many distinct binary tree shapes are there? The first interesting case is for trees with three nodes. There are five distinct shapes:


    *        *          *         *         *
   *        *          * *         *         *
  *          *                    *           *
  (1)       (2)        (3)       (4)        (5)

Filling in the smaller cases, we see there are two distinct shapes of binary trees with two nodes:


    *        *
   *          *
  (1)        (2)

And there is only one shape for a tree with one node:


   *
  (1)

We can also assert that there is exactly one shape representing the empty tree.

In general, if we denote by Cat(n) be the number of distinct binary tree shapes with n nodes, from the investigations above we have:

Cat(0) = 1
Cat(1) = 1
Cat(2) = 2
Cat(3) = 5

For n > 1 a binary tree shape with n nodes consists of a root, a left subtree shape with some number j of nodes, and a right subtree shape with the remaining n-j-1 nodes, accumulating over all possible left-right subtree sizes. In other words, Cat(n) is equal to the number of trees that have a root, a left subtree with j nodes, and a right subtree of (n-1)-j nodes, for each j. This yields to a calculation of the number of binary tree shapes with n nodes in terms of the numbers of shapes for binary trees with less than n nodes:

Cat(n) = Cat(0)×Cat(n-1) + Cat(1)×Cat(n-2) + ... + Cat(n-1)×Cat(0)

which can be expressed as the recursion:

Cat(0) = 1, Cat(n) = ∑_{{i = 0..n-1}} Cat(i)×Cat(n-1-i) for n > 0

The numbers Cat(n) are called the Catalan numbers, after the Belgian mathematician Eugčne Charles Catalan (1814–1894). The first few Catalan numbers are:

Cat(0) = 1
Cat(1) = Cat(0)Cat(0) = 1
Cat(2) = Cat(0)Cat(1) + Cat(1)Cat(0) = 2
Cat(3) = Cat(0)Cat(2) + Cat(1)Cat(1) + Cat(2)Cat(0) = 5
Cat(4) = Cat(0)Cat(3) + Cat(1)Cat(2) + Cat(2)Cat(1) + Cat(3)Cat(0) = 14
Cat(5) = Cat(0)Cat(4) + Cat(1)Cat(3) + Cat(2)Cat(2) + Cat(3)Cat(1) + Cat(4)Cat(0) = 42
Cat(6) = Cat(0)Cat(5) + Cat(1)Cat(4) + Cat(2)Cat(3) + Cat(3)Cat(2) + Cat(4)Cat(1) + Cat(5)Cat(0) = 132
Cat(7) = Cat(0)Cat(6) + Cat(1)Cat(5) + Cat(2)Cat(4) + Cat(3)Cat(3) + Cat(4)Cat(2) + Cat(5)Cat(1) + Cat(6)Cat(0) = 429

Theorem 1. The n-th Catalan number is Cat(n) = Comb(2n,n) / (n + 1), where Comb(2n,n) is the middle binomial coefficient:

           Comb(2n,n)                                n!        n(n-1)(n-2)...(n-k+1)
Cat(n) =   ----------                Comb(n,k) =  --------  =  ---------------------
             n + 1                                k!(n-k!)        k(k-1)(k-2)...1

(See the Wikipedia entry Catalan Numbers for several proofs as well as other interpretations of the Catalan numbers.)

Using the closed form given by Theorem 1, we calculate Cat(8) = the number of 8-node binary tree shapes:

         1   16!    16x15x14x13x12x11x10
Cat(8) = - × ---- = -------------------- = 1430
         9   8!8!      8x7x6x5x4x3x2x1

The first 25 Catalan numbers are listed here.

2 Mapping Permutations to Binary Search Trees

Recall that one way to sort data is to insert the data into a binary search tree and follow with an inorder traversal of the tree. For example, to sort EGBDFHAIC:


input data:     EGBDFHAIC

bst.Insert(E):                 E

bst.Insert(G):                 E
                                   G

bst.Insert(B):                 E
                           B       G

bst.Insert(D):                 E
                           B       G
                             D        

bst.Insert(F):                 E
                           B       G
                             D   F    

bst.Insert(H):                 E
                           B       G
                             D   F   H

bst.Insert(A):                 E
                           B       G
                         A   D   F   H

bst.Insert(I):                 E
                           B       G
                         A   D   F   H
                                      I

bst.Insert(C):                 E
                           B       G
                         A   D   F   H
                            C         I

bst.Traverse(): ABCDEFGHI

The first item inserted becomes the root of the BST, and all subsequent items are inserted in either the left subtree or the right subtree, depending on whether they are smaller or larger than the root. (Note that this exactly mimics what Partition accomplishes in QuickSort.) The runtime cost of this version of InsertionSort is the sum of the various insertion costs plus the traversal cost. The insertion costs are all bounded above by the height H of the constructed tree, which yields:

Cost <= ∑_{j=1..n} H + n = nH + n = n(1+H).

Thus any time the original permutation maps to a "bushy" tree with logarithmic height, the cost of the sort (either BST-InsertionSort or QuickSort) will have runtime cost O(n log n). Note here that there may be many permutations that map to the same tree. For example, EBGDACHFI maps to the same tree as above. We will soon be able to calculate that there are 630 different permutations that map to this particular tree!

3 Counting Permutations

For a given binary tree shape S, let P(S) denote the number of permutations that map to S using the mapping procedure described in the previous section. For the only shape S1 with one node,

there is only one permutation of length one and it maps to S1, so P(S1) = 1.

In general, if a binary tree shape has only one node at each height, such as


 *     *         *        *         *         *          *        *
*       *       *        *           *         *        *        *    ...
               *          *         *           *      *        *
                                                      *          *

then there is only one permutation that maps to that shape: the unique left-right path from root to leaf determines precisely the order of the permutation mapping to that shape. For convenience, call such trees "linear". For a linear tree with n nodes, the sequence of n-1 Left-Right directions in the root-null descending path uniquely determines both the tree shape and the unique permutation that maps to it. The number of linear n-node trees is the number of LR sequences of length n-1, which is 2^n-1.

The smallest shape with P > 1 is the balanced 3-node tree:


 *
* *

Two permutations map to this tree: BAC and BCA.

Now consider a general binary tree shape, depicted as:


 *
L R

where L and R denote the left and right subtrees. The first element in the permutation must map to the root. The remaining elements map either left or right, depending on how they compare with the root. As long as the relative order of the "right" elements and the relative order of the "left" elements are preserved, these elements can be intermingled without affecting the tree shape. There are Comb( |L| + |R|, |L|) ways to intermingle the left and right sequences. (Here, |T| denotes the size of T.)

Thinking recursively, there are P(L) permutations mapping to L and P(R) mapping to R. Therefore a recursion calculating P is given by:

Theorem 2. The number of permutations on n symbols that map to an n-node binary tree shape S is given by

P(S) = Comb(n-1,k) × P(S_L) × P(S_R)

where k is the number of nodes in the left subtree S_L.

Note in passing that Comb(n-1,k) is much, much greater when k is near n/2 than when k is near 1 or n-1. That is, P(S) grows much faster when S is balanced. Here are some sample calculations:

 n    Cat(n)     P(linear)     P(balanced)               n!
--    ------     ---------     -----------               --
 1         1             1               1                1
 3         5             1               2                6
 4        14             1               3               24
 5        42             1               8              120
 6       132             1              20              720
 7       429             1              80             5040
 8      1430             1             210            40320
 9      4862             1             630           362880
...
15   9694845             1        21964800    1307674368000

Now let's look at the question of how many permutations map to trees of a given height. Here is an example worked out in detail for the case n = 5.

size 5 height 2:
   *         *         *        *         *         *    
 *   *     *   *     *   *    *   *     *   *     *   *  
* *       *   *     *     *    * *       *   *       * * 
P = 8     P = 6     P = 6     P = 6     P = 6     P = 8

Above we have all 5-node binary trees with height 2. Below is an abbreviated illustration showing two archetypes with the number of similar trees. The archetypes differ in whether the two bottom nodes are siblings or not. C is the number of all examples with that archetype and P is the number of permutations mapping to each instance of that archetype:

size 5 height 2 archetypes:
   *           *    
 *   *       *   *  
* *        *       *
C=2,P=8    C=4,P=6

Below we show only the three archetypes of 5-node binary trees with height 3. These archetypes vary by the depth where the "twig" is branched off. Again, C is the number of all examples with the twig at that height, P is the number of permutations mapping to each instance.

size 5 height 3 archetypes:
   *            *            *
  *            *            * *
 *            * *          *
* *          *            *
C=4,P=2      C=8,P=3      C=8,P=4

And finally we note that there are 2⁴ = 16 5-node binary trees with height 4:

size 5 height 4 archetypes:
    * 
   *  
  *   
 *
*
C=16,P=1

Note that we have accounted for 2×8 + 4×6 + 4×2 + 8×3 + 8×4 + 16×1 = 120 = 5! permutations: all permutations on 5 symbols. Another way to look at the data is: how many permutations map to a tree of a given height?

perms -->  height
-----      ------
40         2
64         3
16         4

These data can be interpreted as showing that QuickSort is modelled by as short a tree as possible for 1/3 of possible 5-permutations and one that is within 1 of optimal height for another 53% of cases. The worst case runtime will still be realized in the remaining 16 = 13% of cases. The news gets increasingly better as n grows, but there will always remain those 2^n-1 permutations that hit a worst case linear BST. Fortunately, as large as 2^n-1 may seem, it is vanishingly small when compared to n! : 2^n-1/n! ≅ 1/((n/2)!((n-1)/2)!) trends rapidly to zero. For n = 10 fewer than one percent of permutations are linear (worst case). For n = 100 the worst-case tree shapes are less than 0.000000000000000000001 percent of all tree shapes.

But focusing on the absolute worst case does not really tell the story, because the next-to-worst case has tree-height n-2, asymptotically as bad as n-1. The question is, what is the expected tree height for a random permutation? In the case n = 5 above, it is

[h] = (40×2 + 64×3 + 16×4)/120 = 2.8

A general result is:

Theorem 3. The expected height of a random binary search tree with n nodes is O(log n).

Theorem 4. A random permutation on n symbols constructs a binary search tree in time O(n log n).

Theorem 4 is proved in Open Data Structures 7. Theorem 3 is proved in [Cormen]. To see an active demonstration of Theorem 3, locate the file LIB/notes_support/mbst_i.x, copy and execute this program on linprog. What you see is a non-stop generation of BSTs from random permutation input. You can check visually that the height of the BSTs is O(log size).

Corollary. A random permutation on n symbols is sorted by QuickSort in time O(n log n).

3.1 Defining Archetype

We used the term "archetype" above without any precise definition, relying on the context of the examples. The notion can be formalized by defining two binary tree shapes S₁ and S₂ to be P-equivalent iff a sequence of exchanges of right and left subtrees at any node transforms t₁ to t₂. For example:


   *               *    
 *   *   -->     *   *  
* *                 * *

by exchanging the left/right subtrees of the root, and


   *               *               *               *   
 *   *    -->    *   *    -->    *   *    -->    *   * 
*   *          *       *          *    *          * *

by exchanging at the node in red.

Theorem 5. P-equivalence is an equivalence relation. Furthermore, any two P-equivalent tree shapes have the same number of permutations mapping to them: if S₁ and S₂ are P-equivalent them P(S₁) = P(S₂).

Proof. The P-equivalence is an equivalence relation is apparent by writing down the definitions of the three required properties reflexive, symmetric, and transitive. For the second part, assume that shape S₂ is obtained from S₁ by an exchange of left and right subtrees of the root. Note in the formula of Theorem 2,

P(S) = Comb(n-1,k) × P(S_L) × P(S_R)

if there are k nodes in the left subtree then there are n-k-1 nodes in the right subtree, and Comb(n-1,k) = Comb(n-1,n-k-1) by Eq (6) above, so that

P(S₁) = Comb(n-1,k) × P(S_L) × P(S_R) = Comb(n-1,n-k-1) × P(S_R) × P(S_L) = P(S₂).

Two mathematical induction arguments complete the proof: first prove the case of an exchange at any level in the trees, and then extend the reult for any number of exchanges.

Define a binary tree shape archetype to be any representative of a P-equivalence class.