CS 240: Lecture 6 - Merge Sort

Dear students:

Last week we examined two sort algorithms. They contained little magic, but they were also slow. This week we look at two faster sorts: merge sort and quick sort. Both use recursion.

Concrete

Insertion sort and selection sort can be written in just a few lines of code, but the cost of each of those lines is quite high. Both algorithms are $\Theta(n^2)$. Suppose you have a processor that runs at 3.5 gigahertz. That means its clock ticks many times per second. Let's generously assume that one operation of the sorting algorithm can execute on each of those clock ticks. Then we have this sorting speed:

$$\frac{3.5 \times 2^{30} \text{ operations}}{1 \text{ second}}$$

Since there are $n^2$ operations for these sorts, we have this total running time:

$$\begin{align} \text{time} &= \frac{n^2 \text{ operations}}{1 \text{ sort}} \times \frac{1 \text{ second}}{3.5 \times 2^{30} \text{ operations}} \\ &= \frac{n^2}{3.5 \times 2^{30}} \text{ seconds per sort} \end{align}$$

When $n = 1000000$, we can expect a selection or insertion sort to take ~266 seconds. That's not very fast. We need something better. The better algorithm we look at today is merge sort.

Merge Sort

Merge sort sorts an array by splitting it in two and sorting each half independently. The recursive calls keep splitting the array. Eventually we reach a subarray that contains one element or none. The elements in that very small subarray are sorted. Once the left and right subarrays are both sorted, they are merged together into one big sorted array.

The splitting operation might look something like this:

private static void mergeSort(int[] items,
                              int[] tmp,
                              int left,
                              int right) {
  if (left < right) {
    int mid = (left + right) / 2;
    mergeSort(items, tmp, left, mid);
    mergeSort(items, tmp, mid + 1, right);

    merge(items, tmp, left, mid, right);
  }
}

private static void mergeSort(int[] items,
                              int[] tmp,
                              int left,
                              int right) {
  if (left < right) {
    int mid = (left + right) / 2;
    mergeSort(items, tmp, left, mid);
    mergeSort(items, tmp, mid + 1, right);

    merge(items, tmp, left, mid, right);
  }
}

The tmp variable provides scratch space that eases the merging process. The int parameters delimit the window of the array that is being sorted. We shouldn't expect the caller to provide anything but the array to be sorted, so we make this helper method private. The public method primes the parameters to their initial values:

public static void mergeSort(int[] items) {
  mergeSort(items, new int[items.length], 0, items.length - 1);
}

public static void mergeSort(int[] items) {
  mergeSort(items, new int[items.length], 0, items.length - 1);
}

When I look at this algorithm, I find it hard to believe that it actually does anything. Most of its lines are devoted to the splitting. The real sorting magic is done by the merge method.

Tentative Analysis

Finding the cost function for merge sort requires a new strategy. For the selection and insertion sorts, the costs of the inner loop increased linearly over $n$ iterations. By drawing a picture of the cells we touched, we made the argument that the cost functions were $T(n) = \frac{n(n + 1)}{2} \in \Theta(n^2)$.

Drawing a picture for merge sort is possible. But first we use a recurrence. A cost function expressed as a recurrence is recursive. There's a base case for some small value of $n$ and a general case for some general value of $n$. Suppose our cost functions counts the number of items compared because this is the most expensive operation. When $n = 1$, no cells are compared because the array is automatically sorted, and the cost is effectively 0. Thus our base case is:

$$\begin{align} T(1) &= 0 \\ \end{align}$$

In a general case, $n$ is going to be bigger than one. We must count how many operations occur the current level of the recursion and then defer to the cost function to tell us the cost of the recursive calls. That gives us this general structure for the general cost:

$$\begin{align} T(n) &= \text{level cost} + \text{call count} \times T(\text{size of split}) \end{align}$$

For merge sort, the cost of the current level is the cost of the merge operation. We don't know what that is yet. Let's call it $f(n)$. There are two recursive calls, each on half the array. That gives us is this general cost:

$$\begin{align} T(n) &= f(n) + 2T(\frac{n}{2}) \\ \end{align}$$

But what is $f(n)$?

Merge

The merge method receives an array that has two sorted subarrays in it. Its job is to reorganize those two subarrays into one sorted subarray. But it has to do this in an efficient way. It does this by considering only the head elements of the two subarrays at a time. It picks off the smaller of the two as the next element in the merged array.

When a head element is pulled from its subarray, the head index is nudged rightward. Eventually one of the subarrays will be consumed. At that point, we stop comparing the head elements and just draw from the rest of the unconsumed list.

We might implement merge in this way:

private static void merge(int[] items, int[] tmp, int left, int mid, int right) {

  // Copy items to temporary array since we're about to
  // chaotically reorganize the original.
  for (int i = left; i <= right; ++i) {
    tmp[i] = items[i];
  }

  int i = left;
  int leftIndex = left;
  int rightIndex = mid + 1;

  // Compare heads until one subarray is empty.
  for (; leftIndex <= mid && rightIndex <= right; ++i) {
    if (tmp[leftIndex] <= tmp[rightIndex]) {
      items[i] = tmp[leftIndex];
      ++leftIndex;
    } else {
      items[i] = tmp[rightIndex];
      ++rightIndex;
    }
  }

  // Copy over remainder of left subarray.
  for (; leftIndex <= mid; ++leftIndex, ++i) {
    items[i] = tmp[leftIndex];
  }

  // Copy over remainder of right subarray.
  for (; rightIndex <= right; ++rightIndex, ++i) {
    items[i] = tmp[rightIndex];
  }
}

private static void merge(int[] items, int[] tmp, int left, int mid, int right) {

  // Copy items to temporary array since we're about to
  // chaotically reorganize the original.
  for (int i = left; i <= right; ++i) {
    tmp[i] = items[i];
  }

  int i = left;
  int leftIndex = left;
  int rightIndex = mid + 1;

  // Compare heads until one subarray is empty.
  for (; leftIndex <= mid && rightIndex <= right; ++i) {
    if (tmp[leftIndex] <= tmp[rightIndex]) {
      items[i] = tmp[leftIndex];
      ++leftIndex;
    } else {
      items[i] = tmp[rightIndex];
      ++rightIndex;
    }
  }

  // Copy over remainder of left subarray.
  for (; leftIndex <= mid; ++leftIndex, ++i) {
    items[i] = tmp[leftIndex];
  }

  // Copy over remainder of right subarray.
  for (; rightIndex <= right; ++rightIndex, ++i) {
    items[i] = tmp[rightIndex];
  }
}

Only one of the bottom two for loops will have any iterations. Note that there are other ways to write this method.

In the worst case, how many item comparisons will we need to perform to merge our two subarrays? The best case happens when one array is entirely smaller than the other. It will get copied over by the first for loop, and then the other array will be drained. In the worst case, the two halves will be evenly matched. Neither array will get drained before the other until the very last element. That means the first for loop will execute $n - 1$ times.

We now completely describe the cost function of merge sort as a recurrence:

$$\begin{align} T(1) &= 0 \\ T(n) &= n - 1 + 2T(\frac{n}{2}) \\ \end{align}$$

How does this compare to $T(n) = \frac{n(n + 1)}{2}$. Well, we don't know. That recursive term is problematic. We will get rid of it and arrive at what a closed-form solution that can be compared. But not this week.

However, I don't want to leave you in suspense. We can reason about the approximate cost using a picture. At the first level of the recursion, we merge together all of the $n$ elements. At the second level, each half gets merged together, which again is all the $n$ elements. At the third level, each quarter gets merged together, which again is all the $n$ elements. This splitting happens until we reach the base case. All together there are $\log_2 n$ levels, and each level costs $n$. That gives us $T(n) \in \Theta(n\log_2 n)$. We've thrown out some low-order terms in this informal proof.

This complexity class is the product of a linear function and a logarithmic function. We say it's linearithmic. We'll plot a linearithmic function against a quadratic to see how merge sort compares to the selection and insertion sorts.

TODO

You've got a few things to do before we meet again:

Turn in Friday's lab on Gradescope before 5 PM.

Keep working on the first programming assignment. Work steadily or regret it. Commit and push after every work session.

See you next time!

Sincerely,

P.S. It's time for a haiku!

It once took m steps But then they halved the problem Now it takes just n