Linear-Algebra
Tokyo-University
東京大学 情報理工学系研究科 コンピュータ科学専攻 2021年2月実施 問題4
Author
zephyr
Description
Let \(n\) and \(d\) (\(n < d\) ) be natural numbers and \(\mathbb{R}\) be the set of real numbers. Denote by \(\top\) the transposition operator of a vector and a matrix. Define the inner product of two column vectors \(\mathbf{x_1}, \mathbf{x_2} \in \mathbb{R}^d\) as \(\mathbf{x_1}^\top \mathbf{x_2} \in \mathbb{R}\) . Let \(\mathbf{w} = (w_1, w_2, \ldots, w_d)^\top \in \mathbb{R}^d\) be a \(d\) -dimensional column vector, \(\mathbf{X} \in \mathbb{R}^{n \times d}\) an \(n \times d\) matrix where \(\mathbf{X} \mathbf{X}^\top\) is invertible, and \(\mathbf{y} \in \mathbb{R}^n\) an \(n\) -dimensional column vector. Consider solving the following optimization problem by using the Lagrange multipliers method:
\[
\min_{\mathbf{w}} \frac{1}{2} \|\mathbf{w}\|^2 \quad \text{subject to} \quad \mathbf{y} = \mathbf{Xw},
\]
where \(\|\mathbf{w}\| = \sqrt{w_1^2 + w_2^2 + \ldots + w_d^2}\) . The Lagrange function is given by
\[
L(\mathbf{w}, \boldsymbol{\mu}) = \frac{1}{2} \|\mathbf{w}\|^2 + \boldsymbol{\mu}^\top (\mathbf{y} - \mathbf{Xw}),
\]
where \(\boldsymbol{\mu} \in \mathbb{R}^n\) is the Lagrange multipliers.
Let \(\lambda_1, \lambda_2, \ldots, \lambda_n\) be positive real values. The sets of column vectors \(\{\mathbf{u_i} \in \mathbb{R}^n\}_{i=1}^{n}\) and \(\{\mathbf{v_j} \in \mathbb{R}^d\}_{j=1}^{d}\) form an orthonormal basis of \(\mathbb{R}^n\) and \(\mathbb{R}^d\) , respectively; that is, they are all unit vectors and orthogonal to each other. Suppose that the singular value decomposition of \(\mathbf{X}\) is
\[
\mathbf{X} = \mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top,
\]
where \(\mathbf{U}\) is an \(n \times n\) matrix, \(\mathbf{\Lambda}\) is an \(n \times d\) matrix, and \(\mathbf{V}\) is a \(d \times d\) matrix given by
\[
\mathbf{U} = (\mathbf{u_1}, \mathbf{u_2}, \ldots, \mathbf{u_n}), \quad \mathbf{\Lambda} = \begin{pmatrix}
\lambda_1 & 0 & \cdots & 0 & 0 & \cdots & 0 \\
0 & \lambda_2 & \cdots & 0 & 0 & & 0\\
\vdots & \vdots & \ddots & \vdots & \vdots & & \vdots \\
0 & 0 & \cdots & \lambda_n & 0 & \cdots & 0\\
\end{pmatrix}, \quad \mathbf{V}^\top = \begin{pmatrix}
\mathbf{v_1}^\top \\
\mathbf{v_2}^\top \\
\vdots \\
\mathbf{v_d}^\top \\
\end{pmatrix}.
\]
Moreover, define
\[
\mathbf{X^-} = \mathbf{V} (\mathbf{\Lambda}^-)^{\top} \mathbf{U}^\top, \quad \text{where} \quad \mathbf{\Lambda}^- = \begin{pmatrix}
\frac{1}{\lambda_1} & 0 & \cdots & 0 & 0 & \cdots & 0 \\
0 & \frac{1}{\lambda_2} & \cdots & 0 & 0 & & 0\\
\vdots & \vdots & \ddots & \vdots & \vdots & & \vdots \\
0 & 0 & \cdots & \frac{1}{\lambda_n} & 0 & \cdots & 0\\
\end{pmatrix}.
\]
Answer the following questions. Describe not only an answer but also the derivation process.
(1) Express \(\mathbf{XX^-X}\) using only \(\mathbf{X}\) .
(2) Express \(\mathbf{XX^\top}\) using only \(\mathbf{U}\) and \(\lambda_i\) (\(i = 1, \ldots, n\) ).
(3) Suppose we wish to express the stationary points of \(L(\mathbf{w}, \boldsymbol{\mu})\) in the form of \(\mathbf{w} = \mathbf{A} \mathbf{y}\) and \(\boldsymbol{\mu} = \mathbf{B} \mathbf{y}\) . Express the matrices \(\mathbf{A} \in \mathbb{R}^{d \times n}\) and \(\mathbf{B} \in \mathbb{R}^{n \times n}\) using only \(\mathbf{X}\) .
(4) Express \(\mathbf{A}\) in question (3) using only \(\mathbf{X^-}\) .
设 \(n\) 和 \(d\) (\(n < d\) ) 为自然数,\(\mathbb{R}\) 为实数集。记 \(\top\) 为向量和矩阵的转置算子。定义两个列向量 \(\mathbf{x_1}, \mathbf{x_2} \in \mathbb{R}^d\) 的内积为 \(\mathbf{x_1}^\top \mathbf{x_2} \in \mathbb{R}\) 。设 \(\mathbf{w} = (w_1, w_2, \ldots, w_d)^\top \in \mathbb{R}^d\) 为一个 \(d\) 维列向量,\(\mathbf{X} \in \mathbb{R}^{n \times d}\) 是一个 \(n \times d\) 矩阵,其中 \(\mathbf{X} \mathbf{X}^\top\) 可逆,\(\mathbf{y} \in \mathbb{R}^n\) 是一个 \(n\) 维列向量。考虑使用拉格朗日乘子法求解以下优化问题:
\[
\min_{\mathbf{w}} \frac{1}{2} \|\mathbf{w}\|^2 \quad \text{subject to} \quad \mathbf{y} = \mathbf{Xw},
\]
其中 \(\|\mathbf{w}\| = \sqrt{w_1^2 + w_2^2 + \ldots + w_d^2}\) 。拉格朗日函数为
\[
L(\mathbf{w}, \boldsymbol{\mu}) = \frac{1}{2} \|\mathbf{w}\|^2 + \boldsymbol{\mu}^\top (\mathbf{y} - \mathbf{Xw}),
\]
其中 \(\boldsymbol{\mu} \in \mathbb{R}^n\) 是拉格朗日乘子。
设 \(\lambda_1, \lambda_2, \ldots, \lambda_n\) 为正实数。列向量集合 \(\{\mathbf{u_i} \in \mathbb{R}^n\}_{i=1}^{n}\) 和 \(\{\mathbf{v_j} \in \mathbb{R}^d\}_{j=1}^{d}\) 分别构成 \(\mathbb{R}^n\) 和 \(\mathbb{R}^d\) 的正交基;即它们都是单位向量且彼此正交。假设 \(\mathbf{X}\) 的奇异值分解为
\[
\mathbf{X} = \mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top,
\]
其中 \(\mathbf{U}\) 是一个 \(n \times n\) 矩阵,\(\mathbf{\Lambda}\) 是一个 \(n \times d\) 矩阵,\(\mathbf{V}\) 是一个 \(d \times d\) 矩阵,给出如下
\[
\mathbf{U} = (\mathbf{u_1}, \mathbf{u_2}, \ldots, \mathbf{u_n}), \quad \mathbf{\Lambda} = \begin{pmatrix}
\lambda_1 & 0 & \cdots & 0 & 0 & \cdots & 0 \\
0 & \lambda_2 & \cdots & 0 & 0 & & 0\\
\vdots & \vdots & \ddots & \vdots & \vdots & & \vdots \\
0 & 0 & \cdots & \lambda_n & 0 & \cdots & 0\\
\end{pmatrix}, \quad \mathbf{V}^\top = \begin{pmatrix}
\mathbf{v_1}^\top \\
\mathbf{v_2}^\top \\
\vdots \\
\mathbf{v_d}^\top \\
\end{pmatrix}.
\]
此外,定义
\[
\mathbf{X^-} = \mathbf{V} (\mathbf{\Lambda}^-)^{\top} \mathbf{U}^\top, \quad \text{where} \quad \mathbf{\Lambda}^- = \begin{pmatrix}
\frac{1}{\lambda_1} & 0 & \cdots & 0 & 0 & \cdots & 0 \\
0 & \frac{1}{\lambda_2} & \cdots & 0 & 0 & & 0\\
\vdots & \vdots & \ddots & \vdots & \vdots & & \vdots \\
0 & 0 & \cdots & \frac{1}{\lambda_n} & 0 & \cdots & 0\\
\end{pmatrix}.
\]
回答以下问题。描述答案的同时也要给出推导过程。
(1) 使用仅 \(\mathbf{X}\) 表示 \(\mathbf{XX^-X}\) 。
(2) 使用仅 \(\mathbf{U}\) 和 \(\lambda_i\) (\(i = 1, \ldots, n\) ) 表示 \(\mathbf{XX^\top}\) 。
(3) 假设我们希望用 \(\mathbf{w} = \mathbf{A} \mathbf{y}\) 和 \(\boldsymbol{\mu} = \mathbf{B} \mathbf{y}\) 表示 \(L(\mathbf{w}, \boldsymbol{\mu})\) 的驻点。使用仅 \(\mathbf{X}\) 表示矩阵 \(\mathbf{A} \in \mathbb{R}^{d \times n}\) 和 \(\mathbf{B} \in \mathbb{R}^{n \times n}\) 。
(4) 使用仅 \(\mathbf{X^-}\) 表示问题 (3) 中的 \(\mathbf{A}\) 。
Kai
(1)
Given:
\[
\mathbf{X} = \mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top,
\]
\[
\mathbf{X^-} = \mathbf{V} \mathbf{\Lambda}^- \mathbf{U}^\top,
\]
we can write:
\[
\mathbf{X^-} = \mathbf{V} (\mathbf{\Lambda}^-) \mathbf{U}^\top,
\]
where
\[
\mathbf{\Lambda}^- = \begin{pmatrix}
\frac{1}{\lambda_1} & 0 & \cdots & 0 & 0 & \cdots & 0 \\
0 & \frac{1}{\lambda_2} & \cdots & 0 & 0 & & 0\\
\vdots & \vdots & \ddots & \vdots & \vdots & & \vdots \\
0 & 0 & \cdots & \frac{1}{\lambda_n} & 0 & \cdots & 0\\
\end{pmatrix}.
\]
To find \(\mathbf{XX^-X}\) :
\[
\mathbf{XX^-X} = (\mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top)(\mathbf{V} \mathbf{\Lambda}^- \mathbf{U}^\top)(\mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top).
\]
Notice:
\[
\mathbf{V}^\top \mathbf{V} = \mathbf{I}_d, \quad \mathbf{U}^\top \mathbf{U} = \mathbf{I}_n,
\]
therefore:
\[
\mathbf{XX^-X} = \mathbf{U} \mathbf{\Lambda} (\mathbf{V}^\top \mathbf{V}) (\mathbf{\Lambda}^-) \mathbf{U}^\top \mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top = \mathbf{U} \mathbf{\Lambda} (\mathbf{\Lambda}^-) \mathbf{\Lambda} \mathbf{V}^\top = \mathbf{U} \mathbf{\Lambda} \mathbf{I}_n \mathbf{\Lambda} \mathbf{V}^\top = \mathbf{U} \mathbf{\Lambda} \mathbf{\Lambda} \mathbf{V}^\top = \mathbf{X}.
\]
Thus,
\[
\boxed{\mathbf{XX^-X} = \mathbf{X}}.
\]
(2)
Given:
\[
\mathbf{X} = \mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top,
\]
we have:
\[
\mathbf{XX^\top} = (\mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top)(\mathbf{V} \mathbf{\Lambda}^\top \mathbf{U}^\top) = \mathbf{U} \mathbf{\Lambda} \mathbf{\Lambda}^\top \mathbf{U}^\top.
\]
Since \(\mathbf{\Lambda}\) is an \(n \times d\) matrix with singular values \(\lambda_1, \lambda_2, \ldots, \lambda_n\) on the diagonal, \(\mathbf{\Lambda} \mathbf{\Lambda}^\top\) is an \(n \times n\) diagonal matrix:
\[
\mathbf{\Lambda} \mathbf{\Lambda}^\top = \mathrm{diag}(\lambda_1^2, \lambda_2^2, \ldots, \lambda_n^2).
\]
Therefore,
\[
\mathbf{XX^\top} = \mathbf{U} \begin{pmatrix}
\lambda_1^2 & 0 & \cdots & 0 \\
0 & \lambda_2^2 & \cdots & 0 \\
0 & 0 & \ddots & 0 \\
0 & 0 & \cdots & \lambda_n^2 \\
\end{pmatrix} \mathbf{U}^\top.
\]
Thus,
\[
\boxed{\mathbf{XX^\top} = \mathbf{U} \mathrm{diag}(\lambda_1^2, \lambda_2^2, \ldots, \lambda_n^2) \mathbf{U}^\top}.
\]
(3)
To solve the optimization problem using Lagrange multipliers:
\[
L(\mathbf{w}, \boldsymbol{\mu}) = \frac{1}{2} \|\mathbf{w}\|^2 + \boldsymbol{\mu}^\top (\mathbf{y} - \mathbf{Xw}),
\]
we need to find \(\mathbf{w}\) and \(\boldsymbol{\mu}\) such that:
\[
\frac{\partial L}{\partial \mathbf{w}} = 0 \quad \text{and} \quad \frac{\partial L}{\partial \boldsymbol{\mu}} = 0.
\]
First, compute \(\frac{\partial L}{\partial \mathbf{w}}\) :
\[
\frac{\partial L}{\partial \mathbf{w}} = \mathbf{w} - \mathbf{X}^\top \boldsymbol{\mu} = 0 \implies \mathbf{w} = \mathbf{X}^\top \boldsymbol{\mu}.
\]
Next, compute \(\frac{\partial L}{\partial \boldsymbol{\mu}}\) :
\[
\frac{\partial L}{\partial \boldsymbol{\mu}} = \mathbf{y} - \mathbf{Xw} = 0 \implies \mathbf{y} = \mathbf{Xw}.
\]
Substituting \(\mathbf{w} = \mathbf{X}^\top \boldsymbol{\mu}\) into \(\mathbf{y} = \mathbf{Xw}\) :
\[
\mathbf{y} = \mathbf{X} (\mathbf{X}^\top \boldsymbol{\mu}) \implies \mathbf{y} = (\mathbf{X} \mathbf{X}^\top) \boldsymbol{\mu}.
\]
Since \(\mathbf{X} \mathbf{X}^\top\) is invertible,
\[
\boldsymbol{\mu} = (\mathbf{X} \mathbf{X}^\top)^{-1} \mathbf{y}.
\]
Then,
\[
\mathbf{w} = \mathbf{X}^\top \boldsymbol{\mu} = \mathbf{X}^\top (\mathbf{X} \mathbf{X}^\top)^{-1} \mathbf{y}.
\]
Therefore,
\[
\mathbf{A} = \mathbf{X}^\top (\mathbf{X} \mathbf{X}^\top)^{-1}, \quad \mathbf{B} = (\mathbf{X} \mathbf{X}^\top)^{-1}.
\]
Thus,
\[
\boxed{\mathbf{w} = \mathbf{X}^\top (\mathbf{X} \mathbf{X}^\top)^{-1} \mathbf{y}, \quad \boldsymbol{\mu} = (\mathbf{X} \mathbf{X}^\top)^{-1} \mathbf{y}}.
\]
(4)
From question 3, we have:
\[
\mathbf{A} = \mathbf{X}^\top (\mathbf{X} \mathbf{X}^\top)^{-1}.
\]
Using \(\mathbf{X^-}\) :
\[
\mathbf{X^-} = \mathbf{V} \mathbf{\Lambda}^- \mathbf{U}^\top,
\]
we know:
\[
\mathbf{X} \mathbf{X}^- = \mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top \mathbf{V} \mathbf{\Lambda}^- \mathbf{U}^\top = \mathbf{U} \mathbf{\Lambda} \mathbf{\Lambda}^-\mathbf{U}^\top = \mathbf{U} \mathbf{I}_n \mathbf{U}^\top = \mathbf{I}_n,
\]
Therefore:
\[
\mathbf{X}^\top (\mathbf{X} \mathbf{X}^\top)^{-1} = \mathbf{X^-}.
\]
Another way to see this is to use the given SVD of \(\mathbf{X}\) and \(\mathbf{X}\mathbf{X}^\top\) from Question 2:
\[
\mathbf{X} = \mathbf{U} \mathbf{\Lambda} \mathbf{V}^\top,
\]
\[
\mathbf{X}\mathbf{X}^\top = \mathbf{U} \mathrm{diag}(\lambda_1^2, \lambda_2^2, \ldots, \lambda_n^2) \mathbf{U}^\top
\]
Then,
\[
\mathbf{A} = \mathbf{X}^\top (\mathbf{X} \mathbf{X}^\top)^{-1}
\]
\[
= \mathbf{V} \mathbf{\Lambda}^\top \mathbf{U}^\top (\mathbf{U} \mathrm{diag}(\lambda_1^2, \lambda_2^2, \ldots, \lambda_n^2) \mathbf{U}^\top)^{-1}
\]
\[
= \mathbf{V} \mathbf{\Lambda}^\top \mathbf{U}^\top \mathbf{U} \mathrm{diag}(\lambda_1^{-2}, \lambda_2^{-2}, \ldots, \lambda_n^{-2}) \mathbf{U}^\top
\]
\[
= \mathbf{V} \mathbf{\Lambda}^\top \mathrm{diag}(\lambda_1^{-2}, \lambda_2^{-2}, \ldots, \lambda_n^{-2}) \mathbf{U}^\top
\]
\[
= \mathbf{V} \begin{pmatrix}
\lambda_1 & 0 & \cdots & 0 \\
0 & \lambda_2 & \cdots & 0 \\
0 & 0 & \ddots & 0 \\
0 & 0 & \cdots & \lambda_n \\
0 & 0 & \cdots & 0 \\
\vdots & \vdots & & \vdots \\
0 & 0 & \cdots & 0
\end{pmatrix} \begin{pmatrix}
\lambda_1^{-2} & 0 & \cdots & 0 \\
0 & \lambda_2^{-2} & \cdots & 0 \\
0 & 0 & \ddots & 0 \\
0 & 0 & \cdots & \lambda_n^{-2} \\
\end{pmatrix} \mathbf{U}^\top
\]
\[
= \mathbf{V} \begin{pmatrix}
\lambda_1^{-1} & 0 & \cdots & 0 \\
0 & \lambda_2^{-1} & \cdots & 0 \\
0 & 0 & \ddots & 0 \\
0 & 0 & \cdots & \lambda_n^{-1} \\
0 & 0 & \cdots & 0 \\
\vdots & \vdots & & \vdots \\
0 & 0 & \cdots & 0
\end{pmatrix} \mathbf{U}^\top
= \mathbf{X^-}.
\]
Thus,
\[
\boxed{\mathbf{A} = \mathbf{X^-}}.
\]
Knowledge
最优化 奇异值分解 线性代数 拉格朗日乘数法
难点思路
这道题目涉及多个知识点的综合运用,特别是拉格朗日乘数法和奇异值分解的结合使用。重点在于理解矩阵运算和变换的基本性质。
解题技巧和信息
拉格朗日乘数法:在求解带有约束的最优化问题时非常有用。
奇异值分解:帮助简化矩阵运算,特别是对于逆矩阵和伪逆矩阵的计算。
重点词汇
inner product 内积
transpose 转置
Lagrange multipliers 拉格朗日乘数
singular value decomposition 奇异值分解
orthonormal basis 正交归一基
参考资料
《线性代数及其应用》David C. Lay,第四章:奇异值分解
《最优化理论》Edwin K. P. Chong and Stanislaw H. Zak,第三章:拉格朗日乘数法