東京大学情報理工学系研究科創造情報学専攻 2023年8月実施筆記試験第1問

Author

Description (English)

In the following, when a sample consisting of observations is obtained for a stochastic process, we calculate the parameters of the original stochastic information source by using maximum likelihood estimation. Maximum likelihood estimation is a type of parameter estimation method and is described below.

Let be the density function of the distribution in accordance with the parameter . The likelihood function when obtaining a sample is given by . Here, that maximizes is called a maximum likelihood estimator.

Answer the following questions.

(1) As an example, consider the coin toss. Let be the probability of a coin coming up faces. We obtained the observation that when this coin is tossed times, faces come up times. Obtain the likelihood function in this case based on the above definition.

(2) Estimate the probability of a coin coming up faces by using maximum likelihood estimation. First, consider the logarithmic value of the likelihood function (hereafter referred to as the log-likelihood function ), and then find that maximizes the log-likelihood function by differentiation.

A normal distribution with parameters, the mean and the variance , is described as , and its probability density function is given by

Consider the case of observing data generated from multiple normal distributions. This is called the Gaussian Mixture Model (GMM). The data generation process of the GMM is described as follows.

Assume a distribution that selects the number with probability (this is called a categorical distribution C). Also, assume normal distributions , each of which corresponds to each number.
For each element of sample ; 2-I: generate number in accordance with the categorical distribution C mentioned above, 2-II: generate in accordance with the normal distribution corresponding to the generated number .
The generated sequence of observation values is a sample. From this sample, we obtain the GMM parameters by maximum likelihood estimation.

First, consider the case of (data generation from a single normal distribution). When we obtain a sample consisting of observations generated from this normal distribution, we want to estimate the parameters (mean and variance ) by maximum likelihood estimation. Answer the following questions.

(3) Find the log-likelihood function for this case.

(4) Find the value that maximizes the log-likelihood function with respect to the mean (i.e., find the maximum likelihood estimator of the mean ). Similarly, find the value that maximizes the log-likelihood function with respect to the variance (i.e., find the maximum likelihood estimator of the variance ).

Next, consider the case of . Answer the following questions.

(5) We would like to describe the log-likelihood function in this case as a function of the parameters and . First, describe the probability that is generated using . Next, given , describe the generation probability of . Finally, describe the log-likelihood function using the probabilities and obtained above.

We want to maximize the log-likelihood function obtained in Question (5) above. However, since this log-likelihood function contains a sum operation in the logarithmic function, it is difficult to find that makes the derivative zero. Therefore, the following "Algorithm A" is used to iteratively maximize (note that here we only focus on local maximization and do not seek global maximization). In Algorithm A, for the objective function to be maximized, using the auxiliary function and the auxiliary variable where , the operations [step 1] and [step 2] are iteratively repeated to calculate the local optimum of . Answer the following questions.

(6) Prove that, when using above Algorithm A, that is updated by repeating step 1 and step 2 always raises or stops the original objective function .

(7) We use Jensen's inequality to obtain the auxiliary function of the log-likelihood function obtained in Question (5) above. Jensen's inequality for logarithmic functions can be written as (where is a positive variable, is any weight that satisfies and ). Prove this inequality.

(8) Using Jensen's inequality, design an auxiliary function for the log-likelihood function obtained in Question (5). Let be the auxiliary variable here. You can derive the auxiliary function by first dividing the individual normal distributions in the sum operation by and multiplying them by in front of that.

(9) Execute Algorithm A for the auxiliary function obtained in Question (8) above. First, find that maximizes the auxiliary function as [step 1]. Next, find the parameter that maximizes the auxiliary function (where the auxiliary variables are fixed) as [step 2].

Kai

本题考察爆算expectation maximization for Gaussian mixture model (EM for GMM) 的数学推导。注意点在于 (8) 分配的辅助变量是而不是否则无法做。

本题涉及的EM for GMM可以在Berkeley CS 188里发现。

(1)

(2)

(3)

(4)

Note we see as a single variable here.

(5)

(6)

At step 1, let the parameters before be . Then given ,

At step 2, given ,

In the next iteration, by ,

so in every iteration (and every step) the objective raises or stops.

(7)

Since the function is concave, for ,

Let , , , , , then

Suppose for cases when there are variables , Jensen's Inequality holds:

let , , where ,

where

by the supposition. So by induction,

(8)

The log likelihood is

by Jensen's Inequality. Here every has a group of that sums to 1.

Hence the auxiliary function is

(9)

[Step 1] Find the optimal 's given fixed.

which is the sum of negative KL divergences of

To maximize when we can only change the values of ,

i.e.

[Step 2]

Since

we find the zero points of the partial derivatives:

and for we should use constrained optimization where the problem becomes

We use a Lagrange multiplier:

since ,

Hence

Author​

Description (English)​

Kai​

(1)​

(2)​

(3)​

(4)​

(5)​

(6)​

(7)​

(8)​

(9)​

Author

Description (English)

Kai

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)