東京大学新領域創成科学研究科メディカル情報生命専攻 2018年8月実施問題9

Author

Description

We wish to sort an array of integers, ( is a natural number) in ascending order. Assume that loading/storing an integer and comparing two integers take unit time.

(1) Let . Show an algorithm that sorts the array, .

(2) Given two sorted arrays, and , show an algorithm that merges the two arrays and calculates the sorted array in time.

(3) Let be the running time for sorting an array, . We sort the first half of the array in time, and sort the second half similarly. Then we obtain the full sorted array by merging the first and second half of the arrays using the algorithm we used in (2). Derive the recurrence for in terms of and , and then derive an explicit formula for .

(4) Notice that in (2), the first half of the sorted array, , contains the first elements of and the first elements of . Given two sorted arrays, and , show an algorithm that finds in time. For simplicity, you may assume that are distinct numbers.

(5) Assume that we have CPU cores, and assume that we can ignore the synchronization cost between CPU cores. Show the running time complexity of a parallel merge sort algorithm that uses the technique in (4).

我们希望按升序排序一个整数数组 ( 是一个自然数)。假设加载/存储一个整数和比较两个整数所需的时间是单位时间。

(1) 令。展示一个排序数组的算法。

(2) 给定两个已排序的数组和，展示一个算法，将这两个数组合并并计算排序后的数组，时间复杂度为。

(3) 设为排序数组的运行时间。我们在时间内对数组的前半部分进行排序，并以类似方式对后半部分进行排序。然后我们通过使用 (2) 中的算法合并数组的前半部分和后半部分来获得完全排序的数组。推导的递推关系，并得出的显式公式。

(4) 注意在 (2) 中，排序后的数组的前半部分包含了的前个元素和的前个元素。给定两个已排序的数组和，展示一个在时间内找到的算法。为了简单起见，你可以假设是不同的数字。

(5) 假设我们有个 CPU 核，并假设可以忽略 CPU 核之间的同步成本。展示使用 (4) 中技术的并行归并排序算法的运行时间复杂度。

Kai

(1)

For , the array has only two elements, and . We can sort this array with a simple comparison and swap if needed.

Algorithm:

Compare and .
If , swap them.

Pseudocode:

if a_1 > a_2 then
    swap(a_1, a_2)

(2)

Given two sorted arrays and , we merge them into a single sorted array .

Algorithm:

Initialize pointers , , and to .
While both arrays have elements to be compared:
- Compare and .
- Append the smaller element to and increment the corresponding pointer.
- Increment .
If one array is exhausted, append the remaining elements of the other array to .

Pseudocode:

i, j, k = 1, 1, 1
while i <= p and j <= q do
    if x_i < y_j then
        z_k = x_i
        i = i + 1
    else
        z_k = y_j
        j = j + 1
    k = k + 1

while i <= p do
    z_k = x_i
    i = i + 1
    k = k + 1

while j <= q do
    z_k = y_j
    j = j + 1
    k = k + 1

The time complexity of this algorithm is .

(3)

We sort the first and second halves of the array separately and then merge them. The recurrence relation is:

The term comes from the merging step.

To solve this recurrence, we can use the Master Theorem for divide-and-conquer recurrences of the form . Here, , , and .

According to the Master Theorem:

If , then .
.

Thus, matches . Therefore,

(4)

We need to find the position such that the first elements of the merged array come from the first elements of and the first elements of .

Algorithm:

Perform a binary search on to find .
Initialize low and high pointers: and .
While :
- Set .
- Set .
- Check the conditions to adjust the pointers:
  - If and , then is found.
  - If , adjust .
  - Otherwise, adjust .

Pseudocode:

low, high = 1, p+1
while low <= high do
    t = (low + high) / 2
    s = ceil((p + q) / 2) - t
    if x_t <= y_s+1 and y_s <= x_t+1 then
        return t
    else if x_t > y_s+1 then
        high = t - 1
    else
        low = t + 1

The time complexity of this algorithm is due to the binary search.

(5)

Assuming CPU cores and ignoring synchronization costs, we can sort each half of the array in parallel.

For each level of recursion, the work is divided equally among available cores. Therefore, the time complexity at each level is determined by the depth of the recursion tree.

The depth of the recursion tree is , and each merge operation at each level takes divided by the number of available cores.

Overall Time Complexity:

Thus, the parallel merge sort algorithm with CPU cores has a time complexity of .

Knowledge

归并排序二分查找并行计算排序算法

难点思路

在第四部分，找到合适的使得合并的前半部分数组满足特定条件是一个难点。利用二分查找可以有效减少时间复杂度。

解题技巧和信息

归并排序是一种分治算法，其时间复杂度为。
合并两个已排序数组的时间复杂度为。
利用二分查找可以在时间内找到特定位置。
并行计算可以显著加速大规模数据的排序。

重点词汇

merge 合并
binary search 二分查找
parallel computation 并行计算

参考资料

Introduction to Algorithms, Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein, Chapter 2: Getting Started.
The Art of Computer Programming, Donald E. Knuth, Volume 3: Sorting and Searching, Section 5.2.4: Merge Sort.

Author​

Description​

Kai​

(1)​

(2)​

(3)​

(4)​

(5)​

Knowledge​

难点思路​

解题技巧和信息​

重点词汇​

参考资料​

Author

Description

Kai

(1)

(2)

(3)

(4)

(5)

Knowledge

难点思路

解题技巧和信息

重点词汇

参考资料