Log Loss Function: Performance of Logistic Regression

Logistic Regression (Binary classification) is used to classify data into two groups, where one group denoted by 0 and other by 1. And to check the performance of the logistic regression model, we use Log Loss function. Before understanding about the log loss function, we should first ask that why Mean Squared Error which is used in linear regression can't be used in logistic regression.

Why Log Loss and not Mean Squared Error?

Mean squared error is used to calculate the average of the squared difference between actual value and estimated value. When using mean squared error in logistic regression, it becomes very difficult and complex to apply Gradient Descent, which is used to optimize our model by finding the coefficients for different features to minimize the cost function. In logistic regression we use the hypothesis function to find estimated values which consist of all necessary features and their coefficients. This hypothesis function we use is similar to the sigmoid function.

here, S(x) = hypothesis value

x = product of coefficient vector and matrix containing feature values

Using above hypothesis function, the graph of cost function of logistic regression becomes non-convex and there can be many local minimum in the graph. Therefore it becomes very difficult to apply gradient descent to optimize and minimize the cost function. This is the why we have to use Log Loss function in Logistic Regression.

Intuition behind Log Loss

In logistic regression binary classification, there are two different cost function. One for class 0 and other for class 1. 0 and 1 are used to denote the two binary classes here. The cost function is as follows:

here, h(x) = hypothesis value

y = {0,1} for either class

This cost function will give a convex graph (somewhat like U-shaped) on which it becomes easier to apply the gradient descent and get most optimized output.

Log Loss function is the simplification of the above cost function as it combines both the functions. Given below is the log loss function:

By providing y=0 or y=1 input we can see that it will behave as the same way as logistic regression cost function and it results in a probabilistic value which will indicate how good or bad the prediction results are by denoting how far the predictions are from the actual values. Low log loss score necessarily means a better model.

Breaking Into Data Science

Search This Blog

%matplotlib inline vs %matplotlib notebook