東京大学情報理工学系研究科コンピュータ科学専攻 2020年8月実施専門科目問題4

Author

Description

Let be the set of real numbers. Denote by the transposition operator of a vector and a matrix. When is a -dimensional column vector, the norm is defined by . Define the inner product of two column vectors as . For a matrix , define . Let be the trace of the matrix .

Consider the problem of predicting a real-valued label from a -dimensional real vector . For learning a predictor, suppose that training samples

are given where means that is the real-valued label of . In addition, by using a -dimensional vector and observational noise that is independent and identically distributed, assume the data generation process as

where the expectation and variance . Let us introduce the symbols

We also use the symbol where is assumed to be a regular matrix. The expectation over the observational noises is expressed by .

We formulate the learning of a predictor as the following optimization problem.

Answer the following questions. Describe not only an answer but also the derivation process.

(1) Express using , and .

(2) Suppose we wish to express in the form of . Express the matrix and the positive real number using and .

(3) Suppose we wish to express in the form of . Express the matrix using the matrix .

(4) Explain what problem arises when is not a regular matrix and suggest a way to remedy the problem.

Kai

(1)

To find the optimal weight vector , we minimize the loss function defined as:

To minimize , we take the derivative of with respect to and set it to zero:

Solving for gives:

Thus, the optimal weight vector is:

(2)

To express , we first express :

Using the data generation model , we can write . Then:

Expanding and using the properties of expectation:

Since and , we have:

Here, the matrix is and the scalar is .

(3)

We have:

Thus:

Therefore, the matrix is .

(4)

When is not a regular matrix, it is singular and cannot be inverted. This usually happens when the features are linearly dependent, leading to multicollinearity. This makes the computation of unstable or impossible.

A common remedy is to add a regularization term to the loss function, which is known as Ridge Regression. The modified loss function becomes:

where is a regularization parameter. The solution then becomes:

Knowledge

机器学习线性回归最小二乘法岭回归

解题技巧和信息

在回归问题中，当自变量之间存在共线性问题时，使用岭回归可以增加模型的稳定性并避免参数过大。理解最小二乘法的优化问题如何转化为矩阵求解问题是非常重要的。此外，加入正则化项可以有效地解决过拟合问题。

重点词汇

trace (迹) - 矩阵对角线元素之和
regular matrix (正规矩阵) - 具有满秩的矩阵，即矩阵的行列式非零
regularization (正则化) - 添加到损失函数的额外项，以约束模型复杂度并提高泛化能力

参考资料

The Elements of Statistical Learning, Trevor Hastie, Robert Tibshirani, and Jerome Friedman, Chap. 3
Pattern Recognition and Machine Learning, Christopher Bishop, Chap. 4

Author​

Description​

Kai​

(1)​

(2)​

(3)​

(4)​

Knowledge​

解题技巧和信息​

重点词汇​

参考资料​