Skip to content

東京大学 新領域創成科学研究科 メディカル情報生命専攻 2017年8月実施 問題8

Author

zephyr

Description

Let \(\mathbf{A}\) be an \(n \times m\) real matrix with positive rank \(r\). Such a matrix has a singular value decomposition \(\mathbf{A} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T\), where \(\mathbf{U}\) and \(\mathbf{V}\) are \(n \times r\), \(m \times r\) real matrices, respectively, and satisfy \(\mathbf{U}^T \mathbf{U} = \mathbf{I}_r\), \(\mathbf{V}^T \mathbf{V} = \mathbf{I}_r\) (\(\mathbf{I}_d\): \(d \times d\) unit matrix, \(\mathbf{M}^T\): transpose of matrix \(\mathbf{M}\)). \(\mathbf{\Sigma}\) is an \(r \times r\) real diagonal matrix whose diagonal elements \(\Sigma_{kk} = \sigma_k\) (\(k = 1, \ldots, r\)) satisfy \(\sigma_1 \geq \cdots \geq \sigma_r > 0\).

(1) Describe all the positive eigenvalues and associated normalized eigenvectors of matrix \(\mathbf{A}^T \mathbf{A}\).

(2) Let \(T_{\mathbf{A}}: \mathbb{R}^m \to \mathbb{R}^n\) be a linear mapping defined by \(T_{\mathbf{A}} (\mathbf{x}) = \mathbf{A} \mathbf{x}\). Describe the conditions on \(n, m, r\) such that \(T_{\mathbf{A}}\) is surjective. Also, describe the conditions on \(n, m, r\) such that \(T_{\mathbf{A}}\) is injective.

(3) The pseudoinverse of \(\mathbf{A}\) is defined by \(\mathbf{A}^+ = \mathbf{V} \mathbf{\Sigma}^{-1} \mathbf{U}^T\). Let \(\mathbf{B} = (\mathbf{I}_m - \mathbf{A}^+ \mathbf{A})\) and define linear mapping \(T_{\mathbf{B}}: \mathbb{R}^m \to \mathbb{R}^m\) by \(T_{\mathbf{B}} (\mathbf{x}) = \mathbf{B} \mathbf{x}\). Show that image \(\mathrm{Im}(T_{\mathbf{B}}) = \{\mathbf{B} \mathbf{x} \mid \mathbf{x} \in \mathbb{R}^m\}\) is linearly isomorphic to kernel \(\mathrm{Ker}(T_{\mathbf{A}}) = \{\mathbf{x} \in \mathbb{R}^m \mid \mathbf{A} \mathbf{x} = \mathbf{0}_d\}\) (\(\mathbf{0}_d\): \(d\) dimensional zero vector).

(4) Show that \(\mathbf{x} = \mathbf{x}_1 + \mathbf{x}_2\) (\(\mathbf{x}_1 = \mathbf{B} \mathbf{x}\), \(\mathbf{x}_2 = (\mathbf{x} - \mathbf{x}_1)\)) is an orthogonal decomposition.

(5) For a given \(\mathbf{b} \in \mathbb{R}^n\), let \(\mathbf{x}_0 = \mathbf{A}^+ \mathbf{b} \in \mathbb{R}^m\). Show that \(\mathbf{x} = \mathbf{x}_0\) minimizes \((\mathbf{A} \mathbf{x} - \mathbf{b})^T (\mathbf{A} \mathbf{x} - \mathbf{b})\). (Hint: \(\mathbf{A} \mathbf{x} - \mathbf{b} = \mathbf{A} (\mathbf{x} - \mathbf{x}_0) + (\mathbf{A} \mathbf{x}_0 - \mathbf{b})\))


\(\mathbf{A}\) 为一个 \(n \times m\) 的实矩阵,且正秩为 \(r\)。这样的矩阵有一个奇异值分解 \(\mathbf{A} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T\),其中 \(\mathbf{U}\)\(\mathbf{V}\) 分别是 \(n \times r\)\(m \times r\) 的实矩阵,并且满足 \(\mathbf{U}^T \mathbf{U} = \mathbf{I}_r\)\(\mathbf{V}^T \mathbf{V} = \mathbf{I}_r\)\(\mathbf{I}_d\)\(d \times d\) 单位矩阵,\(\mathbf{M}^T\):矩阵 \(\mathbf{M}\) 的转置)。\(\mathbf{\Sigma}\) 是一个 \(r \times r\) 的实对角矩阵,其对角元素 \(\Sigma_{kk} = \sigma_k\)\(k = 1, \ldots, r\))满足 \(\sigma_1 \geq \cdots \geq \sigma_r > 0\)

(1) 描述矩阵 \(\mathbf{A}^T \mathbf{A}\) 的所有正特征值和相关的归一化特征向量。

(2) 令 \(T_{\mathbf{A}}: \mathbb{R}^m \to \mathbb{R}^n\) 为由 \(T_{\mathbf{A}} (\mathbf{x}) = \mathbf{A} \mathbf{x}\) 定义的线性映射。描述 \(n, m, r\) 的条件,使得 \(T_{\mathbf{A}}\) 是满射。同时,描述 \(n, m, r\) 的条件,使得 \(T_{\mathbf{A}}\) 是单射。

(3) \(\mathbf{A}\) 的伪逆定义为 \(\mathbf{A}^+ = \mathbf{V} \mathbf{\Sigma}^{-1} \mathbf{U}^T\)。令 \(\mathbf{B} = (\mathbf{I}_m - \mathbf{A}^+ \mathbf{A})\) 并定义线性映射 \(T_{\mathbf{B}}: \mathbb{R}^m \to \mathbb{R}^m\)\(T_{\mathbf{B}} (\mathbf{x}) = \mathbf{B} \mathbf{x}\)。证明 \(\mathrm{Im}(T_{\mathbf{B}}) = \{\mathbf{B} \mathbf{x} \mid \mathbf{x} \in \mathbb{R}^m\}\) 在线性上同构于 \(\mathrm{Ker}(T_{\mathbf{A}}) = \{\mathbf{x} \in \mathbb{R}^m \mid \mathbf{A} \mathbf{x} = \mathbf{0}_d\}\)\(\mathbf{0}_d\)\(d\) 维零向量)。

(4) 证明 \(\mathbf{x} = \mathbf{x}_1 + \mathbf{x}_2\)\(\mathbf{x}_1 = \mathbf{B} \mathbf{x}\)\(\mathbf{x}_2 = (\mathbf{x} - \mathbf{x}_1)\))是一个正交分解。

(5) 对于给定的 \(\mathbf{b} \in \mathbb{R}^n\),令 \(\mathbf{x}_0 = \mathbf{A}^+ \mathbf{b} \in \mathbb{R}^m\)。证明 \(\mathbf{x} = \mathbf{x}_0\) 最小化 \((\mathbf{A} \mathbf{x} - \mathbf{b})^T (\mathbf{A} \mathbf{x} - \mathbf{b})\)。 (提示:\(\mathbf{A} \mathbf{x} - \mathbf{b} = \mathbf{A} (\mathbf{x} - \mathbf{x}_0) + (\mathbf{A} \mathbf{x}_0 - \mathbf{b})\)

Kai

(1)

Given the singular value decomposition (SVD) of \(\mathbf{A}\) as \(\mathbf{A} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T\), we can express \(\mathbf{A}^T \mathbf{A}\) as follows:

\[ \mathbf{A}^T \mathbf{A} = (\mathbf{U} \mathbf{\Sigma} \mathbf{V}^T)^T (\mathbf{U} \mathbf{\Sigma} \mathbf{V}^T) = \mathbf{V} \mathbf{\Sigma}^T \mathbf{U}^T \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T = \mathbf{V} \mathbf{\Sigma}^2 \mathbf{V}^T \]

The matrix \(\mathbf{\Sigma}^2\) is diagonal with the diagonal elements \(\sigma_k^2\) (\(k = 1, \ldots, r\)). Thus, the positive eigenvalues of \(\mathbf{A}^T \mathbf{A}\) are exactly the \(\sigma_k^2\), and the associated normalized eigenvectors are the columns of \(\mathbf{V}\).

(2)

Surjective (onto): The mapping \(T_{\mathbf{A}}: \mathbb{R}^m \to \mathbb{R}^n\) is surjective if the range of \(\mathbf{A}\) spans \(\mathbb{R}^n\), i.e., \(\mathbf{A}\) has full row rank. This occurs when \(r = n \leq m\).

Injective (one-to-one): The mapping \(T_{\mathbf{A}}\) is injective if the kernel of \(\mathbf{A}\) contains only the zero vector, i.e., \(\mathbf{A}\) has full column rank. This occurs when \(r = m \leq n\).

(3)

The pseudoinverse \(\mathbf{A}^+\) is defined as \(\mathbf{A}^+ = \mathbf{V} \mathbf{\Sigma}^{-1} \mathbf{U}^T\). Consider \(\mathbf{B} = \mathbf{I}_m - \mathbf{A}^+ \mathbf{A}\).

We need to show that \(\mathrm{Im}(T_{\mathbf{B}})\) is isomorphic to \(\mathrm{Ker}(T_{\mathbf{A}})\). Observe the following:

\[ \mathbf{B} \mathbf{A} = (\mathbf{I}_m - \mathbf{A}^+ \mathbf{A}) \mathbf{A} = \mathbf{A} - \mathbf{A}^+ \mathbf{A} \mathbf{A} = \mathbf{A} - \mathbf{A} = \mathbf{0} \]

Thus, \(\mathrm{Im}(\mathbf{B}) \subseteq \mathrm{Ker}(\mathbf{A})\).

Now, consider \(\mathbf{x} \in \mathrm{Ker}(\mathbf{A})\). Then \(\mathbf{A} \mathbf{x} = \mathbf{0}\), and

\[ \mathbf{B} \mathbf{x} = (\mathbf{I}_m - \mathbf{A}^+ \mathbf{A}) \mathbf{x} = \mathbf{x} \]

Thus, \(\mathbf{x} \in \mathrm{Im}(\mathbf{B})\). Therefore, \(\mathrm{Im}(\mathbf{B}) = \mathrm{Ker}(\mathbf{A})\).

(4)

Given \(\mathbf{x} = \mathbf{x}_1 + \mathbf{x}_2\) where \(\mathbf{x}_1 = \mathbf{B} \mathbf{x}\) and \(\mathbf{x}_2 = \mathbf{x} - \mathbf{x}_1\):

\[ \mathbf{x}_2 = \mathbf{x} - \mathbf{B} \mathbf{x} = \mathbf{x} - (\mathbf{I}_m - \mathbf{A}^+ \mathbf{A}) \mathbf{x} = \mathbf{A}^+ \mathbf{A} \mathbf{x} \]

To show orthogonality:

\[ \mathbf{x}_1^T \mathbf{x}_2 = (\mathbf{B} \mathbf{x})^T (\mathbf{A}^+ \mathbf{A} \mathbf{x}) = \mathbf{x}^T \mathbf{B}^T \mathbf{A}^+ \mathbf{A} \mathbf{x} \]

Since \(\mathbf{B}\) is symmetric (\(\mathbf{B} = \mathbf{I}_m - \mathbf{A}^+ \mathbf{A}\)):

\[ \mathbf{x}^T (\mathbf{I}_m - \mathbf{A}^+ \mathbf{A}) \mathbf{A}^+ \mathbf{A} \mathbf{x} = \mathbf{x}^T (\mathbf{A}^+ \mathbf{A} - \mathbf{A}^+ \mathbf{A}) \mathbf{x} = \mathbf{0} \]

Thus, \(\mathbf{x}_1\) and \(\mathbf{x}_2\) are orthogonal.

(5)

Let \(\mathbf{x}_0 = \mathbf{A}^+ \mathbf{b}\). We need to show that \(\mathbf{x} = \mathbf{x}_0\) minimizes the expression.

Consider the error:

\[ \mathbf{A} \mathbf{x} - \mathbf{b} = \mathbf{A} (\mathbf{x} - \mathbf{x}_0) + (\mathbf{A} \mathbf{x}_0 - \mathbf{b}) \]

Since \(\mathbf{x}_0 = \mathbf{A}^+ \mathbf{b}\), we have \(\mathbf{A} \mathbf{x}_0 = \mathbf{b}\), thus:

\[ \mathbf{A} \mathbf{x} - \mathbf{b} = \mathbf{A} (\mathbf{x} - \mathbf{x}_0) \]

The norm to be minimized is:

\[ (\mathbf{A} \mathbf{x} - \mathbf{b})^T (\mathbf{A} \mathbf{x} - \mathbf{b}) = (\mathbf{A} (\mathbf{x} - \mathbf{x}_0))^T (\mathbf{A} (\mathbf{x} - \mathbf{x}_0)) \]

This is minimized when \(\mathbf{x} = \mathbf{x}_0\) since \(\mathbf{A} \mathbf{x}_0 = \mathbf{b}\) and \(\mathbf{A} (\mathbf{x} - \mathbf{x}_0) = \mathbf{0}\).

Knowledge

奇异值分解 线性映射 广义逆矩阵 正交分解 线性代数

重点词汇

  • singular value decomposition (SVD) 奇异值分解
  • pseudoinverse 广义逆
  • surjective 满射
  • injective 单射
  • orthogonal decomposition 正交分解

参考资料

  1. "Linear Algebra and Its Applications" by Gilbert Strang, Chapter 7: The Singular Value Decomposition (SVD)
  2. "Matrix Computations" by Gene H. Golub and Charles F. Van Loan, Chapter 2: Matrix Analysis