東京大学情報理工学系研究科コンピュータ科学専攻 2021年2月実施問題1

Author

Description

Ordered binary trees are trees in which each node has at most two ordered children. Below, \(\mathbf{T_n}\) denotes the set of all the ordered binary trees with \(n\) leaves. Let \(\mathrm{d_T}(v)\) denote the depth of node \(v\) in an ordered binary tree \(T\), i.e., the number of edges on the path from the root to \(v\).

For a sequence \(\mathbf{P} = (c_1, c_2, \ldots, c_n)\) of \(n\) positive real numbers, we define \(\mathbf{S_P}\) and \(\mathbf{H_P}\) by:

\[ \mathbf{S_P} = \sum_{i=1}^{n} c_i, \quad \mathbf{H_P} = - \sum_{i=1}^{n} (c_i \cdot \log_2(c_i/\mathbf{S_P})). \]

For an ordered binary tree \(T \in \mathbf{T_n}\), we define \(\mathbf{W_P}(T)\) by:

\[ \mathbf{W_P}(T) = \sum_{i=1}^{n} (c_i \cdot \mathrm{d_T}(v_i)), \]

where \(v_i\) is the \(i\)-th leaf from left in \(T\).

Answer the following questions.

(1) Give the tree \(T \in \mathbf{T_4}\) that has the smallest value of \(\mathbf{W_P}(T)\) in case \(\mathbf{P} = (4, 2, 1, 1)\).

(2) Show that \(\sum_{i=1}^{n} 2^{-\mathrm{d_T}(v_i)} \leq 1\) holds for any ordered binary tree \(T \in \mathbf{T_n}\) with leaves \(v_1, v_2, \ldots, v_n\).

(3) Assume that \(x_1, x_2, \ldots, x_n\) range over the set of positive real numbers so that \(\sum_{i=1}^{n} x_i = 1\). Show that \(\sum_{i=1}^{n} (c_i \cdot \log_2 x_i)\) is maximized when \(x_i = c_i/\mathbf{S_P}\) for any sequence \(\mathbf{P} = (c_1, c_2, \ldots, c_n)\) of \(n\) positive real numbers.

(4) Show that any ordered binary tree \(T \in \mathbf{T_n}\) satisfies \(\mathbf{W_P}(T) \geq \mathbf{H_P}\) for any sequence \(\mathbf{P} = (c_1, c_2, \ldots, c_n)\) of \(n\) positive real numbers.

有序二叉树是指每个节点最多有两个有序子节点的树。下面，\(\mathbf{T_n}\) 表示具有 \(n\) 叶节点的所有有序二叉树的集合。设 \(\mathrm{d_T}(v)\) 表示有序二叉树 \(T\) 中节点 \(v\) 的深度，即从根到 \(v\) 的路径上的边数。

对于一个由 \(n\) 个正实数组成的序列 \(\mathbf{P} = (c_1, c_2, \ldots, c_n)\)，我们定义 \(\mathbf{S_P}\) 和 \(\mathbf{H_P}\) 如下：

\[ \mathbf{S_P} = \sum_{i=1}^{n} c_i, \quad \mathbf{H_P} = - \sum_{i=1}^{n} (c_i \cdot \log_2(c_i/\mathbf{S_P})). \]

对于一个有序二叉树 \(T \in \mathbf{T_n}\)，我们定义 \(\mathbf{W_P}(T)\) 如下：

\[ \mathbf{W_P}(T) = \sum_{i=1}^{n} (c_i \cdot \mathrm{d_T}(v_i)), \]

其中 \(v_i\) 是 \(T\) 中从左数的第 \(i\) 个叶节点。

回答以下问题。

(1) 给出在 \(\mathbf{P} = (4, 2, 1, 1)\) 情况下 \(\mathbf{T_4}\) 中 \(\mathbf{W_P}(T)\) 最小的树 \(T\)。

(2) 证明对于任何有叶节点 \(v_1, v_2, \ldots, v_n\) 的有序二叉树 \(T \in \mathbf{T_n}\)，\(\sum_{i=1}^{n} 2^{-\mathrm{d_T}(v_i)} \leq 1\) 成立。

(3) 假设 \(x_1, x_2, \ldots, x_n\) 在正实数范围内变化，并且满足 \(\sum_{i=1}^{n} x_i = 1\)。证明对于任何由 \(n\) 个正实数组成的序列 \(\mathbf{P} = (c_1, c_2, \ldots, c_n)\)，当 \(x_i = c_i/\mathbf{S_P}\) 时，\(\sum_{i=1}^{n} (c_i \cdot \log_2 x_i)\) 取得最大值。

(4) 证明对于任何有序二叉树 \(T \in \mathbf{T_n}\)，对于任何由 \(n\) 个正实数组成的序列 \(\mathbf{P} = (c_1, c_2, \ldots, c_n)\)，\(\mathbf{W_P}(T) \geq \mathbf{H_P}\)。

Kai

(1)

To minimize \(\mathbf{W_P}(T)\), we should construct a tree that resembles a Huffman tree, where the most frequent items (with the highest weights) are located at shallower depths. Here, the weights are \((4, 2, 1, 1)\).

Steps to construct the tree: 1. Start by pairing the two smallest weights, which are both \(1\). 2. Combine these to form a subtree with a combined weight of \(2\). 3. Now, we have weights \((4, 2, 2)\). 4. Next, combine the two smallest remaining weights, which are both \(2\). 5. Combine these to form a subtree with a combined weight of \(4\). 6. Finally, combine the two subtrees \((4, 4)\) to form the complete tree.

The resulting tree structure is:

Thus, the depth of each leaf in the tree is:

\(c_1 = 4\), depth \(\mathrm{d_T}(v_1) = 1\)
\(c_2 = 2\), depth \(\mathrm{d_T}(v_2) = 2\)
\(c_3 = 1\), depth \(\mathrm{d_T}(v_3) = 3\)
\(c_4 = 1\), depth \(\mathrm{d_T}(v_4) = 3\)

Now, we calculate \(\mathbf{W_P}(T)\):

\[ \mathbf{W_P}(T) = 4 \cdot 1 + 2 \cdot 2 + 1 \cdot 3 + 1 \cdot 3 = 4 + 4 + 3 + 3 = 14 \]

Thus, the tree \(T\) that minimizes \(\mathbf{W_P}(T)\) has the above structure.

(2)

To prove the inequality \(\sum_{i=1}^{n} 2^{-\mathrm{d_T}(v_i)} \leq 1\) using mathematical induction, we need to follow these steps:

Base Case
Inductive Step

Base Case: For \(n = 1\) (a tree with only one leaf), the depth of the only leaf \(v_1\) is 0.

\[ \sum_{i=1}^{1} 2^{-\mathrm{d_T}(v_i)} = 2^{-\mathrm{d_T}(v_1)} = 2^0 = 1 \]

Thus, the base case holds.

Inductive Step: Assume that for any ordered binary tree with \(k\) leaves, the inequality holds:

\[ \sum_{i=1}^{k} 2^{-\mathrm{d_T}(v_i)} \leq 1 \]

Now, we need to prove that the inequality holds for an ordered binary tree with \(k+1\) leaves.

Consider an ordered binary tree with \(k+1\) leaves.
Let's denote the depth of the leaves in this tree by \(\mathrm{d_T}(v_1), \mathrm{d_T}(v_2), \ldots, \mathrm{d_T}(v_{k+1})\).

When we add an additional leaf to a tree with \(k\) leaves to form a tree with \(k+1\) leaves, we must split one of the existing leaves into two children. This operation increases the depth of the affected leaf by 1 and adds a new leaf with the same depth.

Suppose we split the leaf \(v_j\) (where \(\mathrm{d_T}(v_j) = d\)) into two new leaves \(v_j'\) and \(v_{k+1}\), both at depth \(d+1\).

Thus, we need to show:

\[ \sum_{i=1}^{k} 2^{-\mathrm{d_T}(v_i)} + 2^{-(d+1)} + 2^{-(d+1)} \leq 1 \]

By the inductive hypothesis, for the original tree with \(k\) leaves:

\[ \sum_{i=1}^{k} 2^{-\mathrm{d_T}(v_i)} \leq 1 \]

In the new tree:

\[ \sum_{i=1}^{k} 2^{-\mathrm{d_T}(v_i)} - 2^{-d} + 2^{-(d+1)} + 2^{-(d+1)} \]

Since \(2^{-(d+1)} + 2^{-(d+1)} = 2^{-d}\):

\[ \sum_{i=1}^{k+1} 2^{-\mathrm{d_T}(v_i)} = \sum_{i=1}^{k} 2^{-\mathrm{d_T}(v_i)} \leq 1 \]

Thus, the inequality holds after adding a new leaf and increasing the depth of the original leaf.

By induction, the inequality \(\sum_{i=1}^{n} 2^{-\mathrm{d_T}(v_i)} \leq 1\) holds for all \(n\).

(3)

To maximize \(\sum_{i=1}^{n} (c_i \cdot \log_2 x_i)\) under the constraint \(\sum_{i=1}^{n} x_i = 1\), we use the method of Lagrange multipliers.

Define the Lagrangian:

\[ \mathcal{L}(x_1, \ldots, x_n, \lambda) = \sum_{i=1}^{n} (c_i \cdot \log_2 x_i) + \lambda \left( 1 - \sum_{i=1}^{n} x_i \right) \]

Take the partial derivatives and set them to zero:

\[ \frac{\partial \mathcal{L}}{\partial x_i} = \frac{c_i}{x_i \ln 2} - \lambda = 0 \quad \Rightarrow \quad x_i = \frac{c_i}{\lambda \ln 2} \]

Using the constraint \(\sum_{i=1}^{n} x_i = 1\):

\[ \sum_{i=1}^{n} \frac{c_i}{\lambda \ln 2} = 1 \quad \Rightarrow \quad \lambda \ln 2 = \mathbf{S_P} \quad \Rightarrow \quad \lambda = \frac{\mathbf{S_P}}{\ln 2} \]

Thus, the maximizing \(x_i\) is:

\[ x_i = \frac{c_i}{\mathbf{S_P}} \]

(4)

To show that \(\mathbf{W_P}(T) \geq \mathbf{H_P}\), we need to use the definitions of \(\mathbf{W_P}(T)\) and \(\mathbf{H_P}\) and employ some fundamental principles of information theory and entropy.

Recall:

\[ \mathbf{W_P}(T) = \sum_{i=1}^{n} (c_i \cdot \mathrm{d_T}(v_i)) \]

\[ \mathbf{H_P} = - \sum_{i=1}^{n} \left( c_i \cdot \log_2 \left(\frac{c_i}{\mathbf{S_P}}\right) \right) \]

We start by rewriting \(\mathbf{H_P}\) in a more convenient form:

\[ \mathbf{H_P} = \sum_{i=1}^{n} c_i \cdot \left( \log_2(\mathbf{S_P}) - \log_2(c_i) \right) = \log_2(\mathbf{S_P}) \cdot \sum_{i=1}^{n} c_i - \sum_{i=1}^{n} (c_i \cdot \log_2(c_i)) \]

Since \(\sum_{i=1}^{n} c_i = \mathbf{S_P}\), we get:

\[ \mathbf{H_P} = \mathbf{S_P} \cdot \log_2(\mathbf{S_P}) - \sum_{i=1}^{n} (c_i \cdot \log_2(c_i)) \]

Next, consider the following inequality derived from Jensen's inequality for the concave function \(f(x) = -x \log_2(x)\):

\[ -\sum_{i=1}^{n} \frac{c_i}{\mathbf{S_P}} \cdot \log_2\left( \frac{c_i}{\mathbf{S_P}} \right) \leq - \log_2\left( \sum_{i=1}^{n} \frac{c_i}{\mathbf{S_P}} \cdot \frac{c_i}{\mathbf{S_P}} \right) \]

This simplifies to:

\[ -\sum_{i=1}^{n} \frac{c_i}{\mathbf{S_P}} \cdot \log_2\left( \frac{c_i}{\mathbf{S_P}} \right) \leq - \log_2\left( \sum_{i=1}^{n} \frac{c_i^2}{\mathbf{S_P}^2} \right) \]

Since \(\sum_{i=1}^{n} c_i = \mathbf{S_P}\), we get:

\[ -\sum_{i=1}^{n} \frac{c_i}{\mathbf{S_P}} \cdot \log_2\left( \frac{c_i}{\mathbf{S_P}} \right) \leq - \log_2\left( \frac{1}{\mathbf{S_P}^2} \sum_{i=1}^{n} c_i^2 \right) \]

So:

\[ -\sum_{i=1}^{n} \frac{c_i}{\mathbf{S_P}} \cdot \log_2\left( \frac{c_i}{\mathbf{S_P}} \right) \leq - \log_2\left( \frac{1}{\mathbf{S_P}^2} \cdot \mathbf{S_P} \cdot \frac{1}{n} \sum_{i=1}^{n} c_i \right) = \log_2(\mathbf{S_P}) \]

The weighted path length \(\mathbf{W_P}(T)\) can be understood using Kraft's inequality, which relates the depths of leaves in a binary tree to probabilities that sum up to 1. Let \(p_i = \frac{c_i}{\mathbf{S_P}}\) be the probability associated with the \(i\)-th leaf. According to Kraft's inequality:

\[ \sum_{i=1}^{n} 2^{-\mathrm{d_T}(v_i)} \leq 1 \]

We multiply both sides by \(\mathbf{S_P}\):

\[ \mathbf{S_P} \sum_{i=1}^{n} p_i \cdot 2^{-\mathrm{d_T}(v_i)} \leq \mathbf{S_P} \]

Using the fact that \(p_i = \frac{c_i}{\mathbf{S_P}}\):

\[ \sum_{i=1}^{n} c_i \cdot 2^{-\mathrm{d_T}(v_i)} \leq \mathbf{S_P} \]

Now, we apply the definition of entropy:

\[ \mathbf{H_P} = -\mathbf{S_P} \sum_{i=1}^{n} p_i \log_2(p_i) \]

Using Gibbs' inequality, we know that:

\[ -\sum_{i=1}^{n} p_i \log_2(p_i) \leq \sum_{i=1}^{n} p_i \mathrm{d_T}(v_i) \]

Multiplying both sides by \(\mathbf{S_P}\):

\[ \mathbf{S_P} \cdot \mathbf{H_P} \leq \mathbf{S_P} \sum_{i=1}^{n} p_i \cdot \mathrm{d_T}(v_i) \]

Substituting \(p_i = \frac{c_i}{\mathbf{S_P}}\) into the inequality:

\[ \mathbf{H_P} \leq \sum_{i=1}^{n} c_i \cdot \mathrm{d_T}(v_i) \]

Thus, we have shown that \(\mathbf{W_P}(T) \geq \mathbf{H_P}\) for any ordered binary tree \(T \in \mathbf{T_n}\) and any sequence \(\mathbf{P} = (c_1, c_2, \ldots, c_n)\) of \(n\) positive real numbers.

Knowledge

有序二叉树哈夫曼树信息论拉格朗日乘数法数学归纳法

重点词汇

Ordered binary tree 有序二叉树
Huffman tree 哈夫曼树
Depth 深度
Lagrange multipliers 拉格朗日乘子
Entropy 熵
Lagrange multiplier 拉格朗日乘数
Jensen's inequality 詹森不等式
Gibbs' inequality 吉布斯不等式
Kraft's inequality 克拉夫特不等式

参考资料

Introduction to Algorithms, Chapter 16: Greedy Algorithms
Information Theory, Inference, and Learning Algorithms, David J.C. MacKay

東京大学 情報理工学系研究科 コンピュータ科学専攻 2021年2月実施 問題1