東京大学新領域創成科学研究科メディカル情報生命専攻 2016年8月実施問題9

Author

Description

Suppose that $\mathbf{M}$ is an array with $n (\geq 1)$ distinct integers. The quicksort algorithm for sorting $\mathbf{M}$ in the ascending order has the following three steps.

A) Select and remove an element $x$ from $\mathbf{M}$. Call $x$ a pivot.

B) Divide $\mathbf{M}$ into arrays $\mathbf{M_1}$ and $\mathbf{M_2}$ such that $y \leq x$ for $y \in \mathbf{M_1}$ and $x \leq z$ for $z \in \mathbf{M_2}$.

C) Sort $\mathbf{M_1}$ and $\mathbf{M_2}$ in the ascending order using quicksort, and return the concatenation of $\mathbf{M_1}$, $x$, and $\mathbf{M_2}$.

Answer the following questions.

(1) How would you implement Step B in quicksort?

(2) In Step A, if we always set the first element in $\mathbf{M}$ to pivot $x$, show an input array that the algorithm sorts in $O(n^2)$ worst-case time, and prove this property.

(3) In Step A, if we select a position in $\mathbf{M}$ at random and set the element at the position to pivot $x$, prove that the expected time complexity is $O(n \log n)$ for an arbitrary input array.

(4) Design an algorithm that calculates the $k$-th smallest element in $\mathbf{M}$ in $O(n)$ expected time, and prove this property.

假设 $\mathbf{M}$ 是一个包含 $n (\geq 1)$ 个不同整数的数组。用于升序排列 $\mathbf{M}$ 的快速排序算法有以下三个步骤。

A) 从 $\mathbf{M}$ 中选择并移除一个元素 $x$。称 $x$ 为枢轴。

B) 将 $\mathbf{M}$ 分成数组 $\mathbf{M_1}$ 和 $\mathbf{M_2}$，使得 $y \leq x$ 对于所有 $y \in \mathbf{M_1}$，并且 $x \leq z$ 对于所有 $z \in \mathbf{M_2}$。

C) 使用快速排序按升序排列 $\mathbf{M_1}$ 和 $\mathbf{M_2}$，并返回 $\mathbf{M_1}$，$x$ 和 $\mathbf{M_2}$ 的连接。

回答以下问题。

(1) 你将如何在快速排序中实现步骤 B？

(2) 在步骤 A 中，如果我们总是将 $\mathbf{M}$ 的第一个元素设置为枢轴 $x$，展示一个输入数组，使算法在 $O(n^2)$ 最坏情况下进行排序，并证明这一性质。

(3) 在步骤 A 中，如果我们在 $\mathbf{M}$ 中随机选择一个位置并将该位置的元素设置为枢轴 $x$，证明对于任意输入数组，预期时间复杂度为 $O(n \log n)$。

(4) 设计一个算法，在 $O(n)$ 预期时间内计算 $\mathbf{M}$ 中的第 $k$ 小元素，并证明这一性质。

Kai

(1)

To implement Step B, we need to partition the array $\mathbf{M}$ around the pivot element $x$. Here is a common way to do it, known as the Lomuto partition scheme:

Initialize an index $i$ to track the boundary of elements less than or equal to the pivot.
Traverse the array from left to right, comparing each element with the pivot.
Swap elements to ensure all elements less than or equal to the pivot are on its left, and all elements greater than the pivot are on its right.

Here is a Python function to achieve this:

def partition(arr, low, high):
    pivot = arr[high]  # Choose the last element as pivot
    i = low - 1  # i: Index of smaller element

    for j in range(low, high):
        if arr[j] <= pivot:
            i += 1  # Increment index of smaller element
            arr[i], arr[j] = arr[j], arr[i]
            # Swap the current element to teh end of the smaller half of arr

    arr[i+1], arr[high] = arr[high], arr[i+1]  # Place pivot in correct position
    return i+1  # Return the partition index

(2)

If we always select the first element as the pivot, the worst-case scenario occurs when the array is already sorted (either in ascending or descending order).

Example of Worst-case Input:

\[ \mathbf{M} = [1, 2, 3, \ldots, n] \]

Proof of $O(n^2)$ Time Complexity:

In the first call, the pivot is the smallest element (1), and the array is divided into $\mathbf{M_1} = []$ and $\mathbf{M_2} = [2, 3, \ldots, n]$.
In the second call, the pivot is the smallest element in $\mathbf{M_2}$ (2), and the array is divided into $\mathbf{M_1} = []$ and $\mathbf{M_2} = [3, 4, \ldots, n]$.
This process continues, making $n-1$ comparisons in the first step, $n-2$ in the second step, and so on.

The total number of comparisons is:

$$

(n-1) + (n-2) + \cdots + 1 = \frac{n(n-1)}{2} = O(n^2)

$$

(3)

To rigorously prove that the expected time complexity of Quicksort with a random pivot selection is $O(n \log n)$, we will employ probabilistic analysis.

Definitions and Assumptions

Let $T(n)$ be the expected time complexity of Quicksort on an array of size $n$.
Each pivot is chosen uniformly at random from the array.
$C(n)$ is the number of comparisons made by Quicksort on an array of size $n$.

Key Observations

When a pivot $x$ is chosen, it partitions the array into two subarrays $\mathbf{M_1}$ and $\mathbf{M_2}$. The sizes of $\mathbf{M_1}$ and $\mathbf{M_2}$ depend on the position of $x$ in the sorted array.
If the pivot is the $i$-th smallest element, $\mathbf{M_1}$ will have $i-1$ elements and $\mathbf{M_2}$ will have $n-i$ elements.
The number of comparisons required to partition the array around the pivot is $n-1$.

Expected Comparisons

We aim to find $E[C(n)]$, the expected number of comparisons for an array of size $n$.

The recurrence relation for $C(n)$ is:

\[ C(n) = C(i-1) + C(n-i) + (n-1) \]

where $i$ is the position of the pivot in the sorted array, chosen uniformly at random from $1$ to $n$.

The expected number of comparisons is:

\[ E[C(n)] = E\left[ C(i-1) \right] + E\left[ C(n-i) \right] + (n-1) \]

Taking Expectation

Since the pivot is chosen randomly, the expected sizes of $\mathbf{M_1}$ and $\mathbf{M_2}$ are uniformly distributed:

\[ E[C(n)] = \frac{1}{n} \sum_{i=1}^{n} \left( E[C(i-1)] + E[C(n-i)] \right) + (n-1) \]

This simplifies to:

\[ E[C(n)] = \frac{2}{n} \sum_{i=0}^{n-1} E[C(i)] + (n-1) \]

Solving the Recurrence Relation

To solve the recurrence relation, we will use the concept of telescoping sums and known techniques for analyzing recurrence relations.

First, we need to assume that $T(n)$ is bounded by some function $f(n)$. We hypothesize that $T(n) = O(n \log n)$. Let's test this hypothesis by substituting it into the recurrence relation:

Assume $T(n) = c \cdot n \log n$ for some constant $c$. Then,

\[ T(n) = \frac{2}{n} \sum_{i=0}^{n-1} c \cdot i \log i + (n-1) \]

We can approximate the sum $\sum_{i=0}^{n-1} i \log i$ using integral approximation:

\[ \sum_{i=0}^{n-1} i \log i \approx \int_1^n x \log x \, \mathrm{d}x \]

Compute the integral:

\[ \int x \log x \, \mathrm{d}x = \frac{x^2}{2} \log x - \frac{x^2}{4} + C \]

Evaluate the integral from $1$ to $n$:

\[ \int_1^n x \log x \, \mathrm{d}x = \left[ \frac{x^2}{2} \log x - \frac{x^2}{4} \right]_1^n \]

At $x=n$:

\[ \frac{n^2}{2} \log n - \frac{n^2}{4} \]

At $x=1$:

\[ \frac{1}{2} \log 1 - \frac{1}{4} = -\frac{1}{4} \]

So the integral result is approximately:

\[ \frac{n^2}{2} \log n - \frac{n^2}{4} + \frac{1}{4} \]

Simplifying this, we get:

\[ \int_1^n x \log x \, \mathrm{d}x \approx \frac{n^2}{2} \log n - \frac{n^2}{4} \]

Thus,

\[ \sum_{i=0}^{n-1} i \log i \approx \frac{n^2}{2} \log n - \frac{n^2}{4} \]

Substitute this back into the recurrence relation:

\[ T(n) = \frac{2}{n} \left( \frac{n^2}{2} \log n - \frac{n^2}{4} \right) + (n-1) \]

Simplify:

\[ T(n) = n \log n - \frac{n}{2} + (n-1) \]

Since lower-order terms are negligible when considering asymptotic behavior, we get:

\[ T(n) = O(n \log n) \]

Thus, the expected number of comparisons $T(n)$ for Quicksort with a random pivot selection is $O(n \log n)$.

(4)

This can be achieved using the Quick-select algorithm, which is similar to Quicksort but only recurses into one partition.

Algorithm:

Choose a pivot element $x$ randomly.
Partition the array into $\mathbf{M_1}$ and $\mathbf{M_2}$ as described in Step B.
If the size of $\mathbf{M_1}$ is $k-1$, then $x$ is the $k$-th smallest element.
If the size of $\mathbf{M_1}$ is greater than $k-1$, recurse into $\mathbf{M_1}$.
Otherwise, recurse into $\mathbf{M_2}$ with the adjusted $k$ value.

Python Code:

import random

def quickselect(arr, low, high, k):
    if low == high:
        return arr[low]

    pivot_index = random.randint(low, high)
    pivot_index = partition(arr, low, high)

    if k == pivot_index:
        return arr[k]
    elif k < pivot_index:
        return quickselect(arr, low, pivot_index - 1, k)
    else:
        return quickselect(arr, pivot_index + 1, high, k)

def find_kth_smallest(arr, k):
    return quickselect(arr, 0, len(arr) - 1, k - 1)

Proof of $O(n)$ Expected Time Complexity:

The Quickselect algorithm has the same recurrence as Randomized Quicksort, but only recurses into one subarray. Hence, the expected time complexity is:

\[ T(n) = T\left(\frac{n}{2}\right) + O(n) \]

This solves to $T(n) = O(n)$ using the Master Theorem or similar analysis.

Knowledge

快速排序时间复杂度分治算法

重点词汇

Quicksort 快速排序
Pivot 枢轴
Partition 分区
Expected time complexity 期望时间复杂度
Recurrence relation 递推关系

参考资料

Introduction to Algorithms, Cormen, Leiserson, Rivest, and Stein, 3^rd Edition, Chap. 7 (Quicksort), Chap. 9 (Medians and Order Statistics)

東京大学 新領域創成科学研究科 メディカル情報生命専攻 2016年8月実施 問題9