跳到主要内容

東京大学 情報理工学系研究科 コンピュータ科学専攻 2020年8月実施 専門科目 問題4

Author

zephyr

Description

Let be the set of real numbers. Denote by the transposition operator of a vector and a matrix. When is a -dimensional column vector, the norm is defined by . Define the inner product of two column vectors as . For a matrix , define . Let be the trace of the matrix .

Consider the problem of predicting a real-valued label from a -dimensional real vector . For learning a predictor, suppose that training samples

are given where means that is the real-valued label of . In addition, by using a -dimensional vector and observational noise that is independent and identically distributed, assume the data generation process as

where the expectation and variance . Let us introduce the symbols

We also use the symbol where is assumed to be a regular matrix. The expectation over the observational noises is expressed by .

We formulate the learning of a predictor as the following optimization problem.

Answer the following questions. Describe not only an answer but also the derivation process.

(1) Express using , and .

(2) Suppose we wish to express in the form of . Express the matrix and the positive real number using and .

(3) Suppose we wish to express in the form of . Express the matrix using the matrix .

(4) Explain what problem arises when is not a regular matrix and suggest a way to remedy the problem.

Kai

(1)

To find the optimal weight vector , we minimize the loss function defined as:

To minimize , we take the derivative of with respect to and set it to zero:

Solving for gives:

Thus, the optimal weight vector is:

(2)

To express , we first express :

Using the data generation model , we can write . Then:

Expanding and using the properties of expectation:

Since and , we have:

Here, the matrix is and the scalar is .

(3)

We have:

Thus:

Therefore, the matrix is .

(4)

When is not a regular matrix, it is singular and cannot be inverted. This usually happens when the features are linearly dependent, leading to multicollinearity. This makes the computation of unstable or impossible.

A common remedy is to add a regularization term to the loss function, which is known as Ridge Regression. The modified loss function becomes:

where is a regularization parameter. The solution then becomes:

Knowledge

机器学习 线性回归 最小二乘法 岭回归

解题技巧和信息

在回归问题中,当自变量之间存在共线性问题时,使用岭回归可以增加模型的稳定性并避免参数过大。理解最小二乘法的优化问题如何转化为矩阵求解问题是非常重要的。此外,加入正则化项可以有效地解决过拟合问题。

重点词汇

  • trace (迹) - 矩阵对角线元素之和
  • regular matrix (正规矩阵) - 具有满秩的矩阵,即矩阵的行列式非零
  • regularization (正则化) - 添加到损失函数的额外项,以约束模型复杂度并提高泛化能力

参考资料

  1. The Elements of Statistical Learning, Trevor Hastie, Robert Tibshirani, and Jerome Friedman, Chap. 3
  2. Pattern Recognition and Machine Learning, Christopher Bishop, Chap. 4