跳到主要内容

京都大学 情報学研究科 知能情報学専攻 2022年8月実施 専門科目 S-3

Author

祭音Myyura

Description

設問1

Let us consider a fully-connected feed-forward neural network that has an input of dimensions, an output of classes, and intermediate layers, each having nodes. A sigmoid function is used in all nodes including output nodes. A weight between node and node is denoted as , and there are no bias terms.

(1) Draw this network and specify , , , and . Answer the total number of the network weights.

(2) Show the output of an output node (indexed with ) using the output of the nodes of the preceding layer (indexed with ).

(3) Consider the problem of detecting the source(s) from music recording composed of the sounds of one or more from violin, flute, piano, and singing voices. Describe how the training label will be given for the output nodes (indexed with ). Explain why it is not appropriate to use a softmax function in the output nodes for this problem.

(4) Show the binary cross-entropy of the output and the training label of an output node (indexed with ).

(5) Show the formula to update the weight of an output node (indexed with ) and a node of the preceding layer (indexed with ) based on the gradient descent method with the objective function of the sum of the binary cross-entropy defined above over all classes. Show how you derive the formula.

(6) Show the formula to update the weight of the nodes (indexed with and ), both of which are not in the output layer, based on the error back-propagation method. You do not have to show how you derive it.

(7) Explain why it is difficult to update the weights effectively as the number of network layers becomes large. Describe the methods to mitigate this problem.

設問2

Let us consider training samples of a -dimensional vector , with their mean vector and covariance matrix denoted as and , respectively, where denotes the transpose.

(1) Show the formula to compute the component of the covariance matrix .

(2) Show the formula of the Mahalanobis distance between a sample and this training sample distribution.

(3) In neural network training, we often conduct normalization of inputs so that the distribution for each dimension has a mean of and a variance of . Let and be an original input and its normalized one. Discuss the relationship between the square root of the sum of the squared values in each dimension of , which is regarded as the Euclidean norm , and the above Mahalanobis distance.

Kai

設問1

(1)

the total number of the network weights is

(2)

the output is

where is the sigmoid function.

(3)

The classification classes consist of four categories: violin, flute, piano, and vocals, thus . Let , and construct the training label such that it equals if a particular source is present in the music recording, and otherwise. Using the softmax function in the output layer is inappropriate in this context because it would force the neural network to solve a four-class classification problem, making it unable to represent cases containing multiple sources.

(4)

(5)

Let . The loss function is defined as follows:

the weight is updated as follows:

where is learning rate. Since

we have

(6)

(7)

Vanishing and exploding gradients.

Methods to mitigate the above issue:

  • Weight Initialization
    • Xavier Initialization
    • He Initialization
  • Activation Functions
    • ReLU (Rectified Linear Unit)
  • Batch Normalization
  • Gradient Clipping

設問2

(1)

where denote the -th and -th training sample.

(2)

(3)

When input is normalized, if the covariance matrix is a diagonal matrix, the Mahalanobis distance and the Euclidean distance are equivalent.

Suppose that

since is obtained by normalizing , when is diagonal, the covariance matrix of is identity matrix. Hence

and the Mahalanobis distance is