跳到主要内容

東京大学 情報理工学系研究科 創造情報学専攻 2013年8月実施 筆記試験 第1問

Author

itsuitsuki

Description (English)

We consider a weather prediction system in which a single senior predictor predicts a weather probability distribution on the basis of predictors' prediction results. Below the system is described in details. There are weather predictors, each of whom outputs a weather probability distribution once a day. Here the weather is a binary random variable taking a value 1 or 0 only (1 means "fine" while 0 means "not fine"). It is assumed that the weather is independent of a day. Let the probability distribution that the -th predictor outputs on the -th day be () where we let (). There is a senior predictor who aggregates the outputs of the predictors. On the -th day, the senior predictor takes a weighted average over the probability distributions output by the predictors to output a weather probability distribution (). Here the weight on the -th predictor on the -th day is denoted as ( ()). That is, on the -th day, is given by (see Figure 1). On the -th day, after the senior predictor ouputs, the real outcome of the weather on the day is reported. This process goes on sequentially with respect to . In the above setting, answer the following questions.

(1) Assume that the senior predictor defines the weight of the -th predictor on the -th day so that the ratio of s with respect to is equal to that of their corresponding likelihoods with respect to the sequence of the past data: . That is, when we denote the likelihood of the -th predictor with respect to as ,

the following equation holds: For each , The likelihood of the -th predictor with respect to is calculated as , where we denote as and we set (). In this setting, for each , show a relation between and , and derive a formula for calculating using and ().

(2) Suppose that the senior predictor predicts the weather probability distribution sequentially with respect to for days. Then under the setting of as in (1), write an algorithm for the senior predictor to output a weather probability distribution and update the weights for predictors every day, and show the order of its computation time in terms of and . Here the initial weight for each predictor is set as follows:

(3) When the prediction is sequentially made for days, we define the cumulative predictive loss for the senior predictor with respect to the sequence of observed real coutcome as follows: Then write as a function of () and . Here the logarithm is the natural logarithm.

(4) Prove that the senior predictor's cumulative loss for days defined in (3) is at most larger than the least cumulative loss over all -th predictors for days. Here the cumulative loss for the -th predictor for days is defined as ().