Skip to main content

%matplotlib inline vs %matplotlib notebook

While working on exploring a data set you might all definitely need to visualize the different features and for that you will import matplotlib. But while importing matplotlib we sometimes happen to write either  %matplotlib inline or %matplotlib notebook in our Jupyter notebook. Let us understand what they do. %matplotlib %matplotlib is one of the magic functions you can use in Jupyter notebook. The magic functions can be used to add some dynamic capabilities to the outputs we get, as in general the output of these plots look more like reports. Writing the above magic function sets up necessary background features for python to work with matplotlib. %matplotlib inline %matplotlib inline is used to display the plots inline and on the next cell below the code which outputs the plot. It is used to store the plots in the notebook itself. So if the next time after saving the work done in the notebook, if you again wish to see the visualizations, it will still be available in the notebook

Log Loss Function: Performance of Logistic Regression

Logistic Regression (Binary classification) is used to classify data into two groups, where one group denoted by 0 and other by 1. And to check the performance of the logistic regression model, we use Log Loss function. Before understanding about the log loss function, we should first ask that why Mean Squared Error which is used in linear regression can't be used in logistic regression.

Why Log Loss and not Mean Squared Error?

Mean squared error is used to calculate the average of the squared difference between actual value and estimated value. When using mean squared error in logistic regression, it becomes very difficult and complex to apply Gradient Descent, which is used to optimize our model by finding the coefficients for different features to minimize the cost function. In logistic regression we use the hypothesis function to find estimated values which consist of all necessary features and their coefficients. This hypothesis function we use is similar to the sigmoid function. 

 here, S(x) = hypothesis value
           x = product of coefficient vector and matrix containing feature values
Using above hypothesis function, the graph of cost function of logistic regression becomes non-convex and there can be many local minimum in the graph. Therefore it becomes very difficult to apply gradient descent to optimize and minimize the cost function. This is the why we have to use Log Loss function in Logistic Regression.

Intuition behind Log Loss

In logistic regression binary classification, there are two different cost function. One for class 0 and other for class 1. 0 and 1 are used to denote the two binary classes here. The cost function is as follows:

here, h(x) = hypothesis value
          y = {0,1} for either class
This cost function will give a convex graph (somewhat like U-shaped) on which it becomes easier to apply the gradient descent and get most optimized output. 

Log Loss function is the simplification of the above cost function as it combines both the functions. Given below is the log loss function:

By providing y=0 or y=1 input we can see that it will behave as the same way as logistic regression cost function and it results in a probabilistic value which will indicate how good or bad the prediction results are by denoting how far the predictions are from the actual values. Low log loss score necessarily means a better model. 



Comments

Popular posts from this blog

%matplotlib inline vs %matplotlib notebook

While working on exploring a data set you might all definitely need to visualize the different features and for that you will import matplotlib. But while importing matplotlib we sometimes happen to write either  %matplotlib inline or %matplotlib notebook in our Jupyter notebook. Let us understand what they do. %matplotlib %matplotlib is one of the magic functions you can use in Jupyter notebook. The magic functions can be used to add some dynamic capabilities to the outputs we get, as in general the output of these plots look more like reports. Writing the above magic function sets up necessary background features for python to work with matplotlib. %matplotlib inline %matplotlib inline is used to display the plots inline and on the next cell below the code which outputs the plot. It is used to store the plots in the notebook itself. So if the next time after saving the work done in the notebook, if you again wish to see the visualizations, it will still be available in the notebook

Gradient Boosting Classifier

Gradient Boosting is one of the Boosting Ensemble methods that has been used a lot lately in both regression and classification problems. As the heading suggests we are going to understand Gradient Boosting in classification. But first, let's have a brief introduction to what are ensemble methods. Ensemble Methods Ensemble methods are used in machine learning to create a better and more optimized model and it can do so by learning from other models. Ensemble method uses a sample of models with their results and combines them together to get a more optimized result and therefore it doesn't have to depend on a single predictive model. One of the ways to perform ensemble technique in a classification problem is to use Gradient Boosting Classifier. Gradient Boosting in classification In Gradient Boosting we have multiple decision trees and we use the individual tree to gather their predictions and then combine it with the next decision tree we build. Let's understand the workin