Skip to content

東京大学 情報理工学系研究科 コンピュータ科学専攻 2021年2月実施 問題4

Author

zephyr

Description

Let \(n\) and \(d\) (\(n < d\)) be natural numbers and \(\mathbb{R}\) be the set of real numbers. Denote by \(\top\) the transposition operator of a vector and a matrix. Define the inner product of two column vectors \(\mathbf{x_1}, \mathbf{x_2} \in \mathbb{R}^d\) as \(\mathbf{x_1}^\top \mathbf{x_2} \in \mathbb{R}\). Let \(\mathbf{w} = (w_1, w_2, \ldots, w_d)^\top \in \mathbb{R}^d\) be a \(d\)-dimensional column vector, \(\mathbf{X} \in \mathbb{R}^{n \times d}\) an \(n \times d\) matrix where \(\mathbf{X} \mathbf{X}^\top\) is invertible, and \(\mathbf{y} \in \mathbb{R}^n\) an \(n\)-dimensional column vector. Consider solving the following optimization problem by using the Lagrange multipliers method:

\[ \min_{\mathbf{w}} \frac{1}{2} \|\mathbf{w}\|^2 \quad \text{subject to} \quad \mathbf{y} = \mathbf{Xw}, \]

where \(\|\mathbf{w}\| = \sqrt{w_1^2 + w_2^2 + \ldots + w_d^2}\). The Lagrange function is given by

\[ L(\mathbf{w}, \boldsymbol{\mu}) = \frac{1}{2} \|\mathbf{w}\|^2 + \boldsymbol{\mu}^\top (\mathbf{y} - \mathbf{Xw}), \]

where \(\boldsymbol{\mu} \in \mathbb{R}^n\) is the Lagrange multipliers.

Let \(\lambda_1, \lambda_2, \ldots, \lambda_n\) be positive real values. The sets of column vectors \(\{\mathbf{u_i} \in \mathbb{R}^n\}_{i=1}^{n}\) and \(\{\mathbf{v_j} \in \mathbb{R}^d\}_{j=1}^{d}\) form an orthonormal basis of \(\mathbb{R}^n\) and \(\mathbb{R}^d\), respectively; that is, they are all unit vectors and orthogonal to each other. Suppose that the singular value decomposition of \(\mathbf{X}\) is

\[ \mathbf{X} = \mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top, \]

where \(\mathbf{U}\) is an \(n \times n\) matrix, \(\mathbf{\Lambda}\) is an \(n \times d\) matrix, and \(\mathbf{V}\) is a \(d \times d\) matrix given by

\[ \mathbf{U} = (\mathbf{u_1}, \mathbf{u_2}, \ldots, \mathbf{u_n}), \quad \mathbf{\Lambda} = \begin{pmatrix} \lambda_1 & 0 & \cdots & 0 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 & 0 & & 0\\ \vdots & \vdots & \ddots & \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & \lambda_n & 0 & \cdots & 0\\ \end{pmatrix}, \quad \mathbf{V}^\top = \begin{pmatrix} \mathbf{v_1}^\top \\ \mathbf{v_2}^\top \\ \vdots \\ \mathbf{v_d}^\top \\ \end{pmatrix}. \]

Moreover, define

\[ \mathbf{X^-} = \mathbf{V} (\mathbf{\Lambda}^-)^{\top} \mathbf{U}^\top, \quad \text{where} \quad \mathbf{\Lambda}^- = \begin{pmatrix} \frac{1}{\lambda_1} & 0 & \cdots & 0 & 0 & \cdots & 0 \\ 0 & \frac{1}{\lambda_2} & \cdots & 0 & 0 & & 0\\ \vdots & \vdots & \ddots & \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & \frac{1}{\lambda_n} & 0 & \cdots & 0\\ \end{pmatrix}. \]

Answer the following questions. Describe not only an answer but also the derivation process.

(1) Express \(\mathbf{XX^-X}\) using only \(\mathbf{X}\).

(2) Express \(\mathbf{XX^\top}\) using only \(\mathbf{U}\) and \(\lambda_i\) (\(i = 1, \ldots, n\)).

(3) Suppose we wish to express the stationary points of \(L(\mathbf{w}, \boldsymbol{\mu})\) in the form of \(\mathbf{w} = \mathbf{A} \mathbf{y}\) and \(\boldsymbol{\mu} = \mathbf{B} \mathbf{y}\). Express the matrices \(\mathbf{A} \in \mathbb{R}^{d \times n}\) and \(\mathbf{B} \in \mathbb{R}^{n \times n}\) using only \(\mathbf{X}\).

(4) Express \(\mathbf{A}\) in question (3) using only \(\mathbf{X^-}\).


\(n\)\(d\) (\(n < d\)) 为自然数,\(\mathbb{R}\) 为实数集。记 \(\top\) 为向量和矩阵的转置算子。定义两个列向量 \(\mathbf{x_1}, \mathbf{x_2} \in \mathbb{R}^d\) 的内积为 \(\mathbf{x_1}^\top \mathbf{x_2} \in \mathbb{R}\)。设 \(\mathbf{w} = (w_1, w_2, \ldots, w_d)^\top \in \mathbb{R}^d\) 为一个 \(d\) 维列向量,\(\mathbf{X} \in \mathbb{R}^{n \times d}\) 是一个 \(n \times d\) 矩阵,其中 \(\mathbf{X} \mathbf{X}^\top\) 可逆,\(\mathbf{y} \in \mathbb{R}^n\) 是一个 \(n\) 维列向量。考虑使用拉格朗日乘子法求解以下优化问题:

\[ \min_{\mathbf{w}} \frac{1}{2} \|\mathbf{w}\|^2 \quad \text{subject to} \quad \mathbf{y} = \mathbf{Xw}, \]

其中 \(\|\mathbf{w}\| = \sqrt{w_1^2 + w_2^2 + \ldots + w_d^2}\)。拉格朗日函数为

\[ L(\mathbf{w}, \boldsymbol{\mu}) = \frac{1}{2} \|\mathbf{w}\|^2 + \boldsymbol{\mu}^\top (\mathbf{y} - \mathbf{Xw}), \]

其中 \(\boldsymbol{\mu} \in \mathbb{R}^n\) 是拉格朗日乘子。

\(\lambda_1, \lambda_2, \ldots, \lambda_n\) 为正实数。列向量集合 \(\{\mathbf{u_i} \in \mathbb{R}^n\}_{i=1}^{n}\)\(\{\mathbf{v_j} \in \mathbb{R}^d\}_{j=1}^{d}\) 分别构成 \(\mathbb{R}^n\)\(\mathbb{R}^d\) 的正交基;即它们都是单位向量且彼此正交。假设 \(\mathbf{X}\) 的奇异值分解为

\[ \mathbf{X} = \mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top, \]

其中 \(\mathbf{U}\) 是一个 \(n \times n\) 矩阵,\(\mathbf{\Lambda}\) 是一个 \(n \times d\) 矩阵,\(\mathbf{V}\) 是一个 \(d \times d\) 矩阵,给出如下

\[ \mathbf{U} = (\mathbf{u_1}, \mathbf{u_2}, \ldots, \mathbf{u_n}), \quad \mathbf{\Lambda} = \begin{pmatrix} \lambda_1 & 0 & \cdots & 0 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 & 0 & & 0\\ \vdots & \vdots & \ddots & \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & \lambda_n & 0 & \cdots & 0\\ \end{pmatrix}, \quad \mathbf{V}^\top = \begin{pmatrix} \mathbf{v_1}^\top \\ \mathbf{v_2}^\top \\ \vdots \\ \mathbf{v_d}^\top \\ \end{pmatrix}. \]

此外,定义

\[ \mathbf{X^-} = \mathbf{V} (\mathbf{\Lambda}^-)^{\top} \mathbf{U}^\top, \quad \text{where} \quad \mathbf{\Lambda}^- = \begin{pmatrix} \frac{1}{\lambda_1} & 0 & \cdots & 0 & 0 & \cdots & 0 \\ 0 & \frac{1}{\lambda_2} & \cdots & 0 & 0 & & 0\\ \vdots & \vdots & \ddots & \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & \frac{1}{\lambda_n} & 0 & \cdots & 0\\ \end{pmatrix}. \]

回答以下问题。描述答案的同时也要给出推导过程。

(1) 使用仅 \(\mathbf{X}\) 表示 \(\mathbf{XX^-X}\)

(2) 使用仅 \(\mathbf{U}\)\(\lambda_i\) (\(i = 1, \ldots, n\)) 表示 \(\mathbf{XX^\top}\)

(3) 假设我们希望用 \(\mathbf{w} = \mathbf{A} \mathbf{y}\)\(\boldsymbol{\mu} = \mathbf{B} \mathbf{y}\) 表示 \(L(\mathbf{w}, \boldsymbol{\mu})\) 的驻点。使用仅 \(\mathbf{X}\) 表示矩阵 \(\mathbf{A} \in \mathbb{R}^{d \times n}\)\(\mathbf{B} \in \mathbb{R}^{n \times n}\)

(4) 使用仅 \(\mathbf{X^-}\) 表示问题 (3) 中的 \(\mathbf{A}\)

Kai

(1)

Given:

\[ \mathbf{X} = \mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top, \]
\[ \mathbf{X^-} = \mathbf{V} \mathbf{\Lambda}^- \mathbf{U}^\top, \]

we can write:

\[ \mathbf{X^-} = \mathbf{V} (\mathbf{\Lambda}^-) \mathbf{U}^\top, \]

where

\[ \mathbf{\Lambda}^- = \begin{pmatrix} \frac{1}{\lambda_1} & 0 & \cdots & 0 & 0 & \cdots & 0 \\ 0 & \frac{1}{\lambda_2} & \cdots & 0 & 0 & & 0\\ \vdots & \vdots & \ddots & \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & \frac{1}{\lambda_n} & 0 & \cdots & 0\\ \end{pmatrix}. \]

To find \(\mathbf{XX^-X}\):

\[ \mathbf{XX^-X} = (\mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top)(\mathbf{V} \mathbf{\Lambda}^- \mathbf{U}^\top)(\mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top). \]

Notice:

\[ \mathbf{V}^\top \mathbf{V} = \mathbf{I}_d, \quad \mathbf{U}^\top \mathbf{U} = \mathbf{I}_n, \]

therefore:

\[ \mathbf{XX^-X} = \mathbf{U} \mathbf{\Lambda} (\mathbf{V}^\top \mathbf{V}) (\mathbf{\Lambda}^-) \mathbf{U}^\top \mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top = \mathbf{U} \mathbf{\Lambda} (\mathbf{\Lambda}^-) \mathbf{\Lambda} \mathbf{V}^\top = \mathbf{U} \mathbf{\Lambda} \mathbf{I}_n \mathbf{\Lambda} \mathbf{V}^\top = \mathbf{U} \mathbf{\Lambda} \mathbf{\Lambda} \mathbf{V}^\top = \mathbf{X}. \]

Thus,

\[ \boxed{\mathbf{XX^-X} = \mathbf{X}}. \]

(2)

Given:

\[ \mathbf{X} = \mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top, \]

we have:

\[ \mathbf{XX^\top} = (\mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top)(\mathbf{V} \mathbf{\Lambda}^\top \mathbf{U}^\top) = \mathbf{U} \mathbf{\Lambda} \mathbf{\Lambda}^\top \mathbf{U}^\top. \]

Since \(\mathbf{\Lambda}\) is an \(n \times d\) matrix with singular values \(\lambda_1, \lambda_2, \ldots, \lambda_n\) on the diagonal, \(\mathbf{\Lambda} \mathbf{\Lambda}^\top\) is an \(n \times n\) diagonal matrix:

\[ \mathbf{\Lambda} \mathbf{\Lambda}^\top = \mathrm{diag}(\lambda_1^2, \lambda_2^2, \ldots, \lambda_n^2). \]

Therefore,

\[ \mathbf{XX^\top} = \mathbf{U} \begin{pmatrix} \lambda_1^2 & 0 & \cdots & 0 \\ 0 & \lambda_2^2 & \cdots & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & 0 & \cdots & \lambda_n^2 \\ \end{pmatrix} \mathbf{U}^\top. \]

Thus,

\[ \boxed{\mathbf{XX^\top} = \mathbf{U} \mathrm{diag}(\lambda_1^2, \lambda_2^2, \ldots, \lambda_n^2) \mathbf{U}^\top}. \]

(3)

To solve the optimization problem using Lagrange multipliers:

\[ L(\mathbf{w}, \boldsymbol{\mu}) = \frac{1}{2} \|\mathbf{w}\|^2 + \boldsymbol{\mu}^\top (\mathbf{y} - \mathbf{Xw}), \]

we need to find \(\mathbf{w}\) and \(\boldsymbol{\mu}\) such that:

\[ \frac{\partial L}{\partial \mathbf{w}} = 0 \quad \text{and} \quad \frac{\partial L}{\partial \boldsymbol{\mu}} = 0. \]

First, compute \(\frac{\partial L}{\partial \mathbf{w}}\):

\[ \frac{\partial L}{\partial \mathbf{w}} = \mathbf{w} - \mathbf{X}^\top \boldsymbol{\mu} = 0 \implies \mathbf{w} = \mathbf{X}^\top \boldsymbol{\mu}. \]

Next, compute \(\frac{\partial L}{\partial \boldsymbol{\mu}}\):

\[ \frac{\partial L}{\partial \boldsymbol{\mu}} = \mathbf{y} - \mathbf{Xw} = 0 \implies \mathbf{y} = \mathbf{Xw}. \]

Substituting \(\mathbf{w} = \mathbf{X}^\top \boldsymbol{\mu}\) into \(\mathbf{y} = \mathbf{Xw}\):

\[ \mathbf{y} = \mathbf{X} (\mathbf{X}^\top \boldsymbol{\mu}) \implies \mathbf{y} = (\mathbf{X} \mathbf{X}^\top) \boldsymbol{\mu}. \]

Since \(\mathbf{X} \mathbf{X}^\top\) is invertible,

\[ \boldsymbol{\mu} = (\mathbf{X} \mathbf{X}^\top)^{-1} \mathbf{y}. \]

Then,

\[ \mathbf{w} = \mathbf{X}^\top \boldsymbol{\mu} = \mathbf{X}^\top (\mathbf{X} \mathbf{X}^\top)^{-1} \mathbf{y}. \]

Therefore,

\[ \mathbf{A} = \mathbf{X}^\top (\mathbf{X} \mathbf{X}^\top)^{-1}, \quad \mathbf{B} = (\mathbf{X} \mathbf{X}^\top)^{-1}. \]

Thus,

\[ \boxed{\mathbf{w} = \mathbf{X}^\top (\mathbf{X} \mathbf{X}^\top)^{-1} \mathbf{y}, \quad \boldsymbol{\mu} = (\mathbf{X} \mathbf{X}^\top)^{-1} \mathbf{y}}. \]

(4)

From question 3, we have:

\[ \mathbf{A} = \mathbf{X}^\top (\mathbf{X} \mathbf{X}^\top)^{-1}. \]

Using \(\mathbf{X^-}\):

\[ \mathbf{X^-} = \mathbf{V} \mathbf{\Lambda}^- \mathbf{U}^\top, \]

we know:

\[ \mathbf{X} \mathbf{X}^- = \mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top \mathbf{V} \mathbf{\Lambda}^- \mathbf{U}^\top = \mathbf{U} \mathbf{\Lambda} \mathbf{\Lambda}^-\mathbf{U}^\top = \mathbf{U} \mathbf{I}_n \mathbf{U}^\top = \mathbf{I}_n, \]

Therefore:

\[ \mathbf{X}^\top (\mathbf{X} \mathbf{X}^\top)^{-1} = \mathbf{X^-}. \]

Another way to see this is to use the given SVD of \(\mathbf{X}\) and \(\mathbf{X}\mathbf{X}^\top\) from Question 2:

\[ \mathbf{X} = \mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top, \]
\[ \mathbf{X}\mathbf{X}^\top = \mathbf{U} \mathrm{diag}(\lambda_1^2, \lambda_2^2, \ldots, \lambda_n^2) \mathbf{U}^\top \]

Then,

\[ \mathbf{A} = \mathbf{X}^\top (\mathbf{X} \mathbf{X}^\top)^{-1} \]
\[ = \mathbf{V} \mathbf{\Lambda}^\top \mathbf{U}^\top (\mathbf{U} \mathrm{diag}(\lambda_1^2, \lambda_2^2, \ldots, \lambda_n^2) \mathbf{U}^\top)^{-1} \]
\[ = \mathbf{V} \mathbf{\Lambda}^\top \mathbf{U}^\top \mathbf{U} \mathrm{diag}(\lambda_1^{-2}, \lambda_2^{-2}, \ldots, \lambda_n^{-2}) \mathbf{U}^\top \]
\[ = \mathbf{V} \mathbf{\Lambda}^\top \mathrm{diag}(\lambda_1^{-2}, \lambda_2^{-2}, \ldots, \lambda_n^{-2}) \mathbf{U}^\top \]
\[ = \mathbf{V} \begin{pmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & 0 & \cdots & \lambda_n \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & 0 \end{pmatrix} \begin{pmatrix} \lambda_1^{-2} & 0 & \cdots & 0 \\ 0 & \lambda_2^{-2} & \cdots & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & 0 & \cdots & \lambda_n^{-2} \\ \end{pmatrix} \mathbf{U}^\top \]
\[ = \mathbf{V} \begin{pmatrix} \lambda_1^{-1} & 0 & \cdots & 0 \\ 0 & \lambda_2^{-1} & \cdots & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & 0 & \cdots & \lambda_n^{-1} \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & 0 \end{pmatrix} \mathbf{U}^\top = \mathbf{X^-}. \]

Thus,

\[ \boxed{\mathbf{A} = \mathbf{X^-}}. \]

Knowledge

最优化 奇异值分解 线性代数 拉格朗日乘数法

难点思路

这道题目涉及多个知识点的综合运用,特别是拉格朗日乘数法和奇异值分解的结合使用。重点在于理解矩阵运算和变换的基本性质。

解题技巧和信息

  • 拉格朗日乘数法:在求解带有约束的最优化问题时非常有用。
  • 奇异值分解:帮助简化矩阵运算,特别是对于逆矩阵和伪逆矩阵的计算。

重点词汇

  • inner product 内积
  • transpose 转置
  • Lagrange multipliers 拉格朗日乘数
  • singular value decomposition 奇异值分解
  • orthonormal basis 正交归一基

参考资料

  1. 《线性代数及其应用》David C. Lay,第四章:奇异值分解
  2. 《最优化理论》Edwin K. P. Chong and Stanislaw H. Zak,第三章:拉格朗日乘数法