Regularization techniques explained to the six year old inside of you.
Before we go into the explanation of the different techniques to do regularization we need to understand what is regularization and why do we do it. Then let us begin with …
What is regularization
Regularization is when we avoid over-fitting our function using some techniques (we are going to talk about those techniques later) letting us to reduce the error by adding information to the mix. And why do we want to do this …, when we over-fit our function we are forcing it to be true so we may see is working good on the Data we gave it, but later if we change that Data it isn’t going to work as expected, then we regularize so our function can not only work with ours, also with different Data.
Let us start with the Techniques.
L1 and L2 regularization
L1 also called lasso regression and L2 also called Ridge Regression, they update the general cost function by adding another term known as the regularization term.
What lasso regression does is shrinking the less important features’ coefficient to zero.
Ridge regression adds “squared magnitude” of coefficient as penalty term to the loss function.
For both this techniques lamda needs to be greater than 0, otherwise we will let to the same function and do nothing, in L2 case lambda can’t be very large because it will add too much weight and it will lead to under-fitting. (L2 is the most used of these two).
Dropout
Is the most frequently used due to the good results it produces, it ignores neurons during the training phase, the neurons dropped are chosen randomly, these neurons are not considered during a particular forward or backward pass. By doing this we will prevent over-fitting.
Data Augmentation
Is basically what you are thinking, making your data bigger. but how come if you don’t have more data. Well if we are dealing with images we have some tricks under our sleeve.
We can rotate, shift, flip, scale, etc, so like that we can increase the amount of Data available to train or network.
Early Stopping
Is a strategy where we stop when the validation set is getting worse, so we keep on part of the training set.
That dotted line that you see there is where we need to stop.
All these techniques help us have a better network, so it work not just with our data but with the data out there in the real world.