東京大学 新領域創成科学研究科 メディカル情報生命専攻 2018年8月実施 問題7
Author
Description
To determine the order of \(n \ (\geq 3)\) elements (different natural numbers) \(a_1, \ldots, a_n\), we generate a binary tree starting from the root node by repeating the following step:
- At each node, if the order of all elements has been determined, let the node be a leaf and label it with the order of all elements. Otherwise, select a pair of elements \(a_i, a_j \ (i < j)\) whose order has not yet been determined, and compare them. If \(a_i < a_j\), go to the left child node; otherwise, go to the right child node.
We call such a binary tree a decision tree. A decision tree shows how a sorting algorithm compares elements.
Answer the following questions:
(1) When \(n\) is fixed, among all paths from the root to leaves in all decision trees, describe a shortest path and a longest path.
(2) Prove that in any decision tree, any order of all elements appears in one leaf but does not appear in different leaves.
(3) Let \(h\) denote the height of a decision tree. Prove \(c n \log n \leq h\) for a constant \(c\).
为了确定 \(n \ (\geq 3)\) 个元素(不同的自然数)\(a_1, \ldots, a_n\) 的顺序,我们通过重复以下步骤生成一棵二叉树:
- 在每个节点,如果所有元素的顺序已经确定,则让该节点成为叶子并标记为所有元素的顺序。否则,选择一对顺序尚未确定的元素 \(a_i, a_j \ (i < j)\),并比较它们。如果 \(a_i < a_j\),则转到左子节点;否则,转到右子节点。
我们将这种二叉树称为决策树。决策树显示了排序算法如何比较元素。
回答以下问题:
(1) 当 \(n\) 固定时,在所有决策树中从根到叶的所有路径中,描述最短路径和最长路径。
(2) 证明在任何决策树中,所有元素的任何顺序出现在一个叶子中,但不会出现在不同的叶子中。
(3) 令 \(h\) 表示决策树的高度。证明 \(c n \log n \leq h\) 对于某个常数 \(c\)。
Kai
(1)
To sort \(n\) elements, we need to compare some pairs of elements. Each comparison gives us one bit of information. The shortest path from the root to a leaf in a decision tree corresponds to the minimum number of comparisons needed to determine the order of all elements. The longest path corresponds to the maximum number of comparisons needed.
Shortest Path
For the shortest path in a decision tree, we follow a balanced comparison strategy that minimizes the number of comparisons.
- Initial Comparison: Compare \(a_1\) with \(a_2\).
- If \(a_1 < a_2\), proceed to the next comparison.
-
If \(a_1 > a_2\), swap \(a_1\) and \(a_2\) and then proceed.
-
Subsequent Comparisons: Continue comparing each successive pair of elements in the sequence.
- Compare \(a_2\) with \(a_3\):
- If \(a_2 < a_3\), continue.
- If \(a_2 > a_3\), place \(a_3\) correctly among \(a_1\) and \(a_2\).
- Compare \(a_3\) with \(a_4\), and so on, up to \(a_{n-1}\) and \(a_n\).
In the most optimal scenario, this process ensures that each comparison immediately determines the relative order of the two elements being compared, with a total of \(n-1\) comparisons.
For \(n\) elements \(a_1, a_2, \ldots, a_n\), the comparisons would look like:
- \(a_1 < a_2\)
- \(a_2 < a_3\)
- \(a_3 < a_4\)
- \(\ldots\)
- \(a_{n-1} < a_n\)
In the optimal case, the path has \(n-1\) comparisons, leading to the shortest path of length \(n-1\).
Longest Path
The longest path in a decision tree corresponds to the worst-case scenario where each comparison leads to the maximum possible number of subsequent comparisons. This often occurs when the sequence is in reverse order and we need to sort it into the correct order.
- Initial Comparison: Compare \(a_1\) with \(a_2\).
- If \(a_1 > a_2\), go to the right child node.
-
Otherwise, go to the left child node.
-
Subsequent Comparisons: Continue comparing each pair of elements but always choosing the comparison that leads to the maximum number of steps.
- Compare \(a_1\) with \(a_3\):
- If \(a_1 > a_3\), go to the right child node.
- Otherwise, go to the left child node.
- Continue this process for all \(n\) elements.
In the worst-case scenario, the sequence is completely reversed, requiring \(n(n-1)/2\) comparisons to sort it.
For \(n\) elements \(a_1, a_2, \ldots, a_n\) in reverse order \(a_n, a_{n-1}, \ldots, a_1\), the comparisons would look like:
- Compare \(a_1\) with \(a_2\)
- Compare \(a_1\) with \(a_3\)
- Compare \(a_1\) with \(a_4\)
- \(\ldots\)
- Compare \(a_1\) with \(a_n\)
- Compare \(a_2\) with \(a_3\)
- Compare \(a_2\) with \(a_4\)
- \(\ldots\)
- Compare \(a_2\) with \(a_n\)
- \(\ldots\)
- Compare \(a_{n-1}\) with \(a_n\)
In the worst-case scenario, this process leads to a path with a length of \(O(n^2)\) comparisons, representing the longest path in a decision tree.
(2)
Existence
In any decision tree used for sorting \(n\) elements, each leaf node represents a complete sequence of comparisons that uniquely determines the order of the elements. We need to show that any possible permutation of the \(n\) elements appears in at least one leaf node.
Leaf Node Representation: Each leaf node corresponds to a unique permutation of the \(n\) elements. As we build the tree from the root node, each comparison between two elements \(a_i\) and \(a_j\) (where \(i < j\)) directs us to a new child node, either left or right, based on the result of the comparison in the permutation of the elements, until we reach the leaf node.
Thus, each possible permutation of the elements must be represented by at least one path from the root to a leaf. This ensures that any order of all elements appears in one leaf node, satisfying the existence condition.
Uniqueness
Now, we need to prove that any given order of all \(n\) elements appears in exactly one leaf node and does not appear in multiple leaf nodes.
- Unique Comparison Paths: Each leaf node in a decision tree is reached by a unique sequence of comparisons. This sequence determines a specific permutation of the elements. If two different paths lead to the same permutation, then at least one comparison in the paths, which is located in the common ancestor of the two paths, would have to result in a contradiction.
- Deterministic Nature of Comparisons: Each comparison \(a_i < a_j\) or \(a_i > a_j\) provides a deterministic outcome. Once the comparison is made, the elements \(a_i\) and \(a_j\) are placed in a specific order relative to each other. Different sequences of comparisons lead to different permutations.
- Conclusion: Therefore, any given order of all elements appears in exactly one leaf node and does not appear in different leaf nodes. This ensures the uniqueness condition.
(3)
To prove that \(c n \log n \leq h\) for a constant \(c\), we consider the following:
- Information-Theoretic Argument: The height of a decision tree represents the maximum number of comparisons needed in the worst case. For \(n\) elements, there are \(n!\) possible permutations (orders). Each comparison splits the set of possible permutations, providing one bit of information.
- Lower Bound on Comparisons: The number of comparisons needed to distinguish between \(n!\) permutations is at least \(\log_2(n!)\). Using Stirling's approximation, \(n! \approx \sqrt{2\pi n} \left(\frac{n}{e}\right)^n\), we get:
Ignoring lower-order terms, we have:
- Height Relation: The height \(h\) of the decision tree must be at least \(\log_2(n!)\). Therefore:
where \(c\) is a constant that depends on the base of the logarithm and other factors from the Stirling approximation.
Hence, we have shown that the height of the decision tree \(h\) satisfies the inequality \(c n \log n \leq h\).
Knowledge
决策树 排序算法 复杂度分析 信息论
重点词汇
- Decision Tree 决策树
- Comparison 比较
- Permutation 排列
- Information-Theoretic 信息论的
- Stirling's Approximation 斯特林近似
参考资料
- "Introduction to Algorithms" by Cormen, Leiserson, Rivest, and Stein, Chapter on Sorting and Order Statistics.
- "The Art of Computer Programming" by Donald Knuth, Volume 3: Sorting and Searching.