跳到主要内容

東京大学 新領域創成科学研究科 メディカル情報生命専攻 2018年8月実施 問題9

Author

zephyr

Description

We wish to sort an array of integers, ( is a natural number) in ascending order. Assume that loading/storing an integer and comparing two integers take unit time.

(1) Let . Show an algorithm that sorts the array, .

(2) Given two sorted arrays, and , show an algorithm that merges the two arrays and calculates the sorted array in time.

(3) Let be the running time for sorting an array, . We sort the first half of the array in time, and sort the second half similarly. Then we obtain the full sorted array by merging the first and second half of the arrays using the algorithm we used in (2). Derive the recurrence for in terms of and , and then derive an explicit formula for .

(4) Notice that in (2), the first half of the sorted array, , contains the first elements of and the first elements of . Given two sorted arrays, and , show an algorithm that finds in time. For simplicity, you may assume that are distinct numbers.

(5) Assume that we have CPU cores, and assume that we can ignore the synchronization cost between CPU cores. Show the running time complexity of a parallel merge sort algorithm that uses the technique in (4).


我们希望按升序排序一个整数数组 ( 是一个自然数)。假设加载/存储一个整数和比较两个整数所需的时间是单位时间。

(1) 令 。展示一个排序数组 的算法。

(2) 给定两个已排序的数组 ,展示一个算法,将这两个数组合并并计算排序后的数组 ,时间复杂度为

(3) 设 为排序数组 的运行时间。我们在 时间内对数组的前半部分进行排序,并以类似方式对后半部分进行排序。然后我们通过使用 (2) 中的算法合并数组的前半部分和后半部分来获得完全排序的数组。推导 的递推关系,并得出 的显式公式。

(4) 注意在 (2) 中,排序后的数组的前半部分 包含了 的前 个元素和 的前 个元素。给定两个已排序的数组 ,展示一个在 时间内找到 的算法。为了简单起见,你可以假设 是不同的数字。

(5) 假设我们有 个 CPU 核,并假设可以忽略 CPU 核之间的同步成本。展示使用 (4) 中技术的并行归并排序算法的运行时间复杂度。

Kai

(1)

For , the array has only two elements, and . We can sort this array with a simple comparison and swap if needed.

Algorithm:

  1. Compare and .
  2. If , swap them.

Pseudocode:

if a_1 > a_2 then
swap(a_1, a_2)

(2)

Given two sorted arrays and , we merge them into a single sorted array .

Algorithm:

  1. Initialize pointers , , and to .
  2. While both arrays have elements to be compared:
    • Compare and .
    • Append the smaller element to and increment the corresponding pointer.
    • Increment .
  3. If one array is exhausted, append the remaining elements of the other array to .

Pseudocode:

i, j, k = 1, 1, 1
while i <= p and j <= q do
if x_i < y_j then
z_k = x_i
i = i + 1
else
z_k = y_j
j = j + 1
k = k + 1

while i <= p do
z_k = x_i
i = i + 1
k = k + 1

while j <= q do
z_k = y_j
j = j + 1
k = k + 1

The time complexity of this algorithm is .

(3)

We sort the first and second halves of the array separately and then merge them. The recurrence relation is:

The term comes from the merging step.

To solve this recurrence, we can use the Master Theorem for divide-and-conquer recurrences of the form . Here, , , and .

According to the Master Theorem:

  • If , then .
  • .

Thus, matches . Therefore,

(4)

We need to find the position such that the first elements of the merged array come from the first elements of and the first elements of .

Algorithm:

  1. Perform a binary search on to find .
  2. Initialize low and high pointers: and .
  3. While :
    • Set .
    • Set .
    • Check the conditions to adjust the pointers:
      • If and , then is found.
      • If , adjust .
      • Otherwise, adjust .

Pseudocode:

low, high = 1, p+1
while low <= high do
t = (low + high) / 2
s = ceil((p + q) / 2) - t
if x_t <= y_s+1 and y_s <= x_t+1 then
return t
else if x_t > y_s+1 then
high = t - 1
else
low = t + 1

The time complexity of this algorithm is due to the binary search.

(5)

Assuming CPU cores and ignoring synchronization costs, we can sort each half of the array in parallel.

For each level of recursion, the work is divided equally among available cores. Therefore, the time complexity at each level is determined by the depth of the recursion tree.

The depth of the recursion tree is , and each merge operation at each level takes divided by the number of available cores.

Overall Time Complexity:

Thus, the parallel merge sort algorithm with CPU cores has a time complexity of .

Knowledge

归并排序 二分查找 并行计算 排序算法

难点思路

在第四部分,找到合适的 使得合并的前半部分数组满足特定条件是一个难点。利用二分查找可以有效减少时间复杂度。

解题技巧和信息

  • 归并排序是一种分治算法,其时间复杂度为
  • 合并两个已排序数组的时间复杂度为
  • 利用二分查找可以在 时间内找到特定位置。
  • 并行计算可以显著加速大规模数据的排序。

重点词汇

  • merge 合并
  • binary search 二分查找
  • parallel computation 并行计算

参考资料

  1. Introduction to Algorithms, Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein, Chapter 2: Getting Started.
  2. The Art of Computer Programming, Donald E. Knuth, Volume 3: Sorting and Searching, Section 5.2.4: Merge Sort.