Let us consider a binary classification problem in which a real-valued vector in a -dimensional space is classified into a class . The boundary between class 0 and class 1 in this space defined by a classification method is referred to as the decision boundary.
Let be a prototype vector of class 0 and be a prototype vector of class 1, and consider a method to classify into the class with a smaller squared distance between its prototype vector and . Derive an equation of the decision boundary and show whether it is linear or not. Here we assume .
Assume that class 0 follows a multi-variate normal distribution with a mean vector and a covariance matrix , and that class 1 follows a normal distribution with a mean vector and the same covariance matrix . Then, we consider a method to classify into the class with a larger likelihood for . Derive an equation of the decision boundary and show whether it is linear or not. Here we assume that there exists an inverse matrix of , and .
Consider a method that models the posterior probability of class 1 given , , with the standard sigmoid function of the inner product , where is a weight parameter vector. Derive the logarithm of the ratio of the posterior probabilities of class 0 and class 1, and show whether it is linear or not.
In the method of Q.3, let us define the loss function with the binary cross-entropy of the posterior probabilities of class 0 and class 1. Derive its gradient with respect to the weight parameter by showing the derivation process.
Describe a method to extend the method of Q.3 and Q.4 to a multi-class classification problem. Specifically, show a function to compute the posterior probability of class and a loss function, together with its gradient with respect to the weight parameter. You do not have to show the derivation process.
Briefly describe a method to extend the method of Q.5 to a multi-layer feed-forward neural network. Specifically, show an activation function used in each layer and a loss function, together with a method to compute the gradients with respect to the weight parameters.