Skip to content

東京大学 新領域創成科学研究科 メディカル情報生命専攻 2016年8月実施 問題9

Author

zephyr

Description

Suppose that \(\mathbf{M}\) is an array with \(n (\geq 1)\) distinct integers. The quicksort algorithm for sorting \(\mathbf{M}\) in the ascending order has the following three steps.

A) Select and remove an element \(x\) from \(\mathbf{M}\). Call \(x\) a pivot.

B) Divide \(\mathbf{M}\) into arrays \(\mathbf{M_1}\) and \(\mathbf{M_2}\) such that \(y \leq x\) for \(y \in \mathbf{M_1}\) and \(x \leq z\) for \(z \in \mathbf{M_2}\).

C) Sort \(\mathbf{M_1}\) and \(\mathbf{M_2}\) in the ascending order using quicksort, and return the concatenation of \(\mathbf{M_1}\), \(x\), and \(\mathbf{M_2}\).

Answer the following questions.

(1) How would you implement Step B in quicksort?

(2) In Step A, if we always set the first element in \(\mathbf{M}\) to pivot \(x\), show an input array that the algorithm sorts in \(O(n^2)\) worst-case time, and prove this property.

(3) In Step A, if we select a position in \(\mathbf{M}\) at random and set the element at the position to pivot \(x\), prove that the expected time complexity is \(O(n \log n)\) for an arbitrary input array.

(4) Design an algorithm that calculates the \(k\)-th smallest element in \(\mathbf{M}\) in \(O(n)\) expected time, and prove this property.


假设 \(\mathbf{M}\) 是一个包含 \(n (\geq 1)\) 个不同整数的数组。用于升序排列 \(\mathbf{M}\) 的快速排序算法有以下三个步骤。

A) 从 \(\mathbf{M}\) 中选择并移除一个元素 \(x\)。称 \(x\) 为枢轴。

B) 将 \(\mathbf{M}\) 分成数组 \(\mathbf{M_1}\)\(\mathbf{M_2}\),使得 \(y \leq x\) 对于所有 \(y \in \mathbf{M_1}\),并且 \(x \leq z\) 对于所有 \(z \in \mathbf{M_2}\)

C) 使用快速排序按升序排列 \(\mathbf{M_1}\)\(\mathbf{M_2}\),并返回 \(\mathbf{M_1}\)\(x\)\(\mathbf{M_2}\) 的连接。

回答以下问题。

(1) 你将如何在快速排序中实现步骤 B?

(2) 在步骤 A 中,如果我们总是将 \(\mathbf{M}\) 的第一个元素设置为枢轴 \(x\),展示一个输入数组,使算法在 \(O(n^2)\) 最坏情况下进行排序,并证明这一性质。

(3) 在步骤 A 中,如果我们在 \(\mathbf{M}\) 中随机选择一个位置并将该位置的元素设置为枢轴 \(x\),证明对于任意输入数组,预期时间复杂度为 \(O(n \log n)\)

(4) 设计一个算法,在 \(O(n)\) 预期时间内计算 \(\mathbf{M}\) 中的第 \(k\) 小元素,并证明这一性质。

Kai

(1)

To implement Step B, we need to partition the array \(\mathbf{M}\) around the pivot element \(x\). Here is a common way to do it, known as the Lomuto partition scheme:

  1. Initialize an index \(i\) to track the boundary of elements less than or equal to the pivot.
  2. Traverse the array from left to right, comparing each element with the pivot.
  3. Swap elements to ensure all elements less than or equal to the pivot are on its left, and all elements greater than the pivot are on its right.

Here is a Python function to achieve this:

def partition(arr, low, high):
    pivot = arr[high]  # Choose the last element as pivot
    i = low - 1  # i: Index of smaller element

    for j in range(low, high):
        if arr[j] <= pivot:
            i += 1  # Increment index of smaller element
            arr[i], arr[j] = arr[j], arr[i]
            # Swap the current element to teh end of the smaller half of arr

    arr[i+1], arr[high] = arr[high], arr[i+1]  # Place pivot in correct position
    return i+1  # Return the partition index

(2)

If we always select the first element as the pivot, the worst-case scenario occurs when the array is already sorted (either in ascending or descending order).

Example of Worst-case Input:

\[ \mathbf{M} = [1, 2, 3, \ldots, n] \]

Proof of \(O(n^2)\) Time Complexity:

  1. In the first call, the pivot is the smallest element (1), and the array is divided into \(\mathbf{M_1} = []\) and \(\mathbf{M_2} = [2, 3, \ldots, n]\).
  2. In the second call, the pivot is the smallest element in \(\mathbf{M_2}\) (2), and the array is divided into \(\mathbf{M_1} = []\) and \(\mathbf{M_2} = [3, 4, \ldots, n]\).
  3. This process continues, making \(n-1\) comparisons in the first step, \(n-2\) in the second step, and so on.

The total number of comparisons is:

$$

(n-1) + (n-2) + \cdots + 1 = \frac{n(n-1)}{2} = O(n^2)

$$

(3)

To rigorously prove that the expected time complexity of Quicksort with a random pivot selection is \(O(n \log n)\), we will employ probabilistic analysis.

Definitions and Assumptions

  • Let \(T(n)\) be the expected time complexity of Quicksort on an array of size \(n\).
  • Each pivot is chosen uniformly at random from the array.
  • \(C(n)\) is the number of comparisons made by Quicksort on an array of size \(n\).

Key Observations

  1. When a pivot \(x\) is chosen, it partitions the array into two subarrays \(\mathbf{M_1}\) and \(\mathbf{M_2}\). The sizes of \(\mathbf{M_1}\) and \(\mathbf{M_2}\) depend on the position of \(x\) in the sorted array.

  2. If the pivot is the \(i\)-th smallest element, \(\mathbf{M_1}\) will have \(i-1\) elements and \(\mathbf{M_2}\) will have \(n-i\) elements.

  3. The number of comparisons required to partition the array around the pivot is \(n-1\).

Expected Comparisons

We aim to find \(E[C(n)]\), the expected number of comparisons for an array of size \(n\).

The recurrence relation for \(C(n)\) is:

\[ C(n) = C(i-1) + C(n-i) + (n-1) \]

where \(i\) is the position of the pivot in the sorted array, chosen uniformly at random from \(1\) to \(n\).

The expected number of comparisons is:

\[ E[C(n)] = E\left[ C(i-1) \right] + E\left[ C(n-i) \right] + (n-1) \]

Taking Expectation

Since the pivot is chosen randomly, the expected sizes of \(\mathbf{M_1}\) and \(\mathbf{M_2}\) are uniformly distributed:

\[ E[C(n)] = \frac{1}{n} \sum_{i=1}^{n} \left( E[C(i-1)] + E[C(n-i)] \right) + (n-1) \]

This simplifies to:

\[ E[C(n)] = \frac{2}{n} \sum_{i=0}^{n-1} E[C(i)] + (n-1) \]

Solving the Recurrence Relation

To solve the recurrence relation, we will use the concept of telescoping sums and known techniques for analyzing recurrence relations.

First, we need to assume that \(T(n)\) is bounded by some function \(f(n)\). We hypothesize that \(T(n) = O(n \log n)\). Let's test this hypothesis by substituting it into the recurrence relation:

Assume \(T(n) = c \cdot n \log n\) for some constant \(c\). Then,

\[ T(n) = \frac{2}{n} \sum_{i=0}^{n-1} c \cdot i \log i + (n-1) \]

We can approximate the sum \(\sum_{i=0}^{n-1} i \log i\) using integral approximation:

\[ \sum_{i=0}^{n-1} i \log i \approx \int_1^n x \log x \, \mathrm{d}x \]

Compute the integral:

\[ \int x \log x \, \mathrm{d}x = \frac{x^2}{2} \log x - \frac{x^2}{4} + C \]

Evaluate the integral from \(1\) to \(n\):

\[ \int_1^n x \log x \, \mathrm{d}x = \left[ \frac{x^2}{2} \log x - \frac{x^2}{4} \right]_1^n \]

At \(x=n\):

\[ \frac{n^2}{2} \log n - \frac{n^2}{4} \]

At \(x=1\):

\[ \frac{1}{2} \log 1 - \frac{1}{4} = -\frac{1}{4} \]

So the integral result is approximately:

\[ \frac{n^2}{2} \log n - \frac{n^2}{4} + \frac{1}{4} \]

Simplifying this, we get:

\[ \int_1^n x \log x \, \mathrm{d}x \approx \frac{n^2}{2} \log n - \frac{n^2}{4} \]

Thus,

\[ \sum_{i=0}^{n-1} i \log i \approx \frac{n^2}{2} \log n - \frac{n^2}{4} \]

Substitute this back into the recurrence relation:

\[ T(n) = \frac{2}{n} \left( \frac{n^2}{2} \log n - \frac{n^2}{4} \right) + (n-1) \]

Simplify:

\[ T(n) = n \log n - \frac{n}{2} + (n-1) \]

Since lower-order terms are negligible when considering asymptotic behavior, we get:

\[ T(n) = O(n \log n) \]

Thus, the expected number of comparisons \(T(n)\) for Quicksort with a random pivot selection is \(O(n \log n)\).

(4)

This can be achieved using the Quick-select algorithm, which is similar to Quicksort but only recurses into one partition.

Algorithm:

  1. Choose a pivot element \(x\) randomly.
  2. Partition the array into \(\mathbf{M_1}\) and \(\mathbf{M_2}\) as described in Step B.
  3. If the size of \(\mathbf{M_1}\) is \(k-1\), then \(x\) is the \(k\)-th smallest element.
  4. If the size of \(\mathbf{M_1}\) is greater than \(k-1\), recurse into \(\mathbf{M_1}\).
  5. Otherwise, recurse into \(\mathbf{M_2}\) with the adjusted \(k\) value.

Python Code:

import random

def quickselect(arr, low, high, k):
    if low == high:
        return arr[low]

    pivot_index = random.randint(low, high)
    pivot_index = partition(arr, low, high)

    if k == pivot_index:
        return arr[k]
    elif k < pivot_index:
        return quickselect(arr, low, pivot_index - 1, k)
    else:
        return quickselect(arr, pivot_index + 1, high, k)

def find_kth_smallest(arr, k):
    return quickselect(arr, 0, len(arr) - 1, k - 1)

Proof of \(O(n)\) Expected Time Complexity:

The Quickselect algorithm has the same recurrence as Randomized Quicksort, but only recurses into one subarray. Hence, the expected time complexity is:

\[ T(n) = T\left(\frac{n}{2}\right) + O(n) \]

This solves to \(T(n) = O(n)\) using the Master Theorem or similar analysis.

Knowledge

快速排序 时间复杂度 分治算法

重点词汇

  • Quicksort 快速排序
  • Pivot 枢轴
  • Partition 分区
  • Expected time complexity 期望时间复杂度
  • Recurrence relation 递推关系

参考资料

  1. Introduction to Algorithms, Cormen, Leiserson, Rivest, and Stein, 3rd Edition, Chap. 7 (Quicksort), Chap. 9 (Medians and Order Statistics)