跳到主要内容

東京大学 情報理工学系研究科 創造情報学専攻 2023年8月実施 筆記試験 第1問

Author

itsuitsuki

Description (English)

In the following, when a sample consisting of observations is obtained for a stochastic process, we calculate the parameters of the original stochastic information source by using maximum likelihood estimation. Maximum likelihood estimation is a type of parameter estimation method and is described below.

Let be the density function of the distribution in accordance with the parameter . The likelihood function when obtaining a sample is given by . Here, that maximizes is called a maximum likelihood estimator.

Answer the following questions.

(1) As an example, consider the coin toss. Let be the probability of a coin coming up faces. We obtained the observation that when this coin is tossed times, faces come up times. Obtain the likelihood function in this case based on the above definition.

(2) Estimate the probability of a coin coming up faces by using maximum likelihood estimation. First, consider the logarithmic value of the likelihood function (hereafter referred to as the log-likelihood function ), and then find that maximizes the log-likelihood function by differentiation.

A normal distribution with parameters, the mean and the variance , is described as , and its probability density function is given by

Consider the case of observing data generated from multiple normal distributions. This is called the Gaussian Mixture Model (GMM). The data generation process of the GMM is described as follows.

  1. Assume a distribution that selects the number with probability (this is called a categorical distribution C). Also, assume normal distributions , each of which corresponds to each number.
  2. For each element of sample ; 2-I: generate number in accordance with the categorical distribution C mentioned above, 2-II: generate in accordance with the normal distribution corresponding to the generated number .
  3. The generated sequence of observation values is a sample. From this sample, we obtain the GMM parameters by maximum likelihood estimation.

First, consider the case of (data generation from a single normal distribution). When we obtain a sample consisting of observations generated from this normal distribution, we want to estimate the parameters (mean and variance ) by maximum likelihood estimation. Answer the following questions.

(3) Find the log-likelihood function for this case.

(4) Find the value that maximizes the log-likelihood function with respect to the mean (i.e., find the maximum likelihood estimator of the mean ). Similarly, find the value that maximizes the log-likelihood function with respect to the variance (i.e., find the maximum likelihood estimator of the variance ).

Next, consider the case of . Answer the following questions.

(5) We would like to describe the log-likelihood function in this case as a function of the parameters and . First, describe the probability that is generated using . Next, given , describe the generation probability of . Finally, describe the log-likelihood function using the probabilities and obtained above.

We want to maximize the log-likelihood function obtained in Question (5) above. However, since this log-likelihood function contains a sum operation in the logarithmic function, it is difficult to find that makes the derivative zero. Therefore, the following "Algorithm A" is used to iteratively maximize (note that here we only focus on local maximization and do not seek global maximization). In Algorithm A, for the objective function to be maximized, using the auxiliary function and the auxiliary variable where , the operations [step 1] and [step 2] are iteratively repeated to calculate the local optimum of . Answer the following questions.

(6) Prove that, when using above Algorithm A, that is updated by repeating step 1 and step 2 always raises or stops the original objective function .

(7) We use Jensen's inequality to obtain the auxiliary function of the log-likelihood function obtained in Question (5) above. Jensen's inequality for logarithmic functions can be written as (where is a positive variable, is any weight that satisfies and ). Prove this inequality.

(8) Using Jensen's inequality, design an auxiliary function for the log-likelihood function obtained in Question (5). Let be the auxiliary variable here. You can derive the auxiliary function by first dividing the individual normal distributions in the sum operation by and multiplying them by in front of that.

(9) Execute Algorithm A for the auxiliary function obtained in Question (8) above. First, find that maximizes the auxiliary function as [step 1]. Next, find the parameter that maximizes the auxiliary function (where the auxiliary variables are fixed) as [step 2].