東京大学 情報理工学系研究科 コンピュータ科学専攻 2018年2月実施 問題3
Author
Description
Suppose that we have a set of
- FIND(x) identifies the subset that element
belongs to. - MERGE(A, B) merges two subsets,
and .
We use a forest-of-trees structure, where each subset forms a tree. Each tree node corresponds to an element and has a pointer to its parent. The pointer of a root node points to the identity of the subset it belongs to. FIND(x) operation traces pointers from node
We initially have
Answer the following questions:
(1) How many merge operations are required to merge all the subsets?
(2) What is the minimum (best case) tree height after the completion of all the merge operations among all the possible merge sequences? Also explain why.
(3) What is the maximum (worst case) tree height after the completion of all the merge operations among all the possible merge sequences? Also explain why.
(4) One can reduce the maximum (worst case) tree height by slightly modifying the MERGE(A, B) operation. Explain how to modify the operation. Also, give the maximum tree height when using the modified operation, with a brief explanation.
(5) One can reduce the height of a tree without increasing computational complexity by performing an additional procedure when applying the FIND(x) operation to an element
Kai
emph{See chapter
We have set of
(1)
(2)
Minimum tree height:
(3)
Maximum tree height:
for i in range(1, 2**N):
merge(v[i-1], v[i])
(4)
New MERGE(A, B): change pointer of the root node of subset with smaller height so that it points to root node of the other subset.
This technique is called merge (union) by rank (rank - upper bound on the height of the node).
How to get a tree with maximum height?
The observation is, merging trees having the same height will result in a taller tree.
If
Suppose we had a maximum-height tree with
(5)
During execution of FIND, we make every node on the find-path point directly to the root. This technique is called path compression*.
In details, first we find a path from a node its root. Then, we go through the path again and change pointers of nodes on the path. Original FIND was linear in length of the path. Now we scan the path twice, which still yields linear time.