Let be the set of real numbers. Denote by the transposition operator of a vector and a matrix. When is a -dimensional column vector, the norm is defined by . Define the inner product of two column vectors as . For a matrix , define . Let be the trace of the matrix .
Consider the problem of predicting a real-valued label from a -dimensional real vector . For learning a predictor, suppose that training samples
are given where means that is the real-valued label of . In addition, by using a -dimensional vector and observational noise that is independent and identically distributed, assume the data generation process as
where the expectation and variance . Let us introduce the symbols
We also use the symbol where is assumed to be a regular matrix. The expectation over the observational noises is expressed by .
We formulate the learning of a predictor as the following optimization problem.
Answer the following questions. Describe not only an answer but also the derivation process.
(1) Express using , and .
(2) Suppose we wish to express in the form of . Express the matrix and the positive real number using and .
(3) Suppose we wish to express in the form of . Express the matrix using the matrix .
(4) Explain what problem arises when is not a regular matrix and suggest a way to remedy the problem.
When is not a regular matrix, it is singular and cannot be inverted. This usually happens when the features are linearly dependent, leading to multicollinearity. This makes the computation of unstable or impossible.
A common remedy is to add a regularization term to the loss function, which is known as Ridge Regression. The modified loss function becomes:
where is a regularization parameter. The solution then becomes: