Regularization techniques explained to the six year old inside of you.

3 min readAug 29, 2020

Before we go into the explanation of the different techniques to do regularization we need to understand what is regularization and why do we do it. Then let us begin with …

What is regularization

Regularization is when we avoid over-fitting our function using some techniques (we are going to talk about those techniques later) letting us to reduce the error by adding information to the mix. And why do we want to do this …, when we over-fit our function we are forcing it to be true so we may see is working good on the Data we gave it, but later if we change that Data it isn’t going to work as expected, then we regularize so our function can not only work with ours, also with different Data.

Let us start with the Techniques.

L1 and L2 regularization

L1 also called lasso regression and L2 also called Ridge Regression, they update the general cost function by adding another term known as the regularization term.

Lasso regression

What lasso regression does is shrinking the less important features’ coefficient to zero.

Ridge Regression

Ridge regression adds “squared magnitude” of coefficient as penalty term to the loss function.

For both this techniques lamda needs to be greater than 0, otherwise we will let to the same function and do nothing, in L2 case lambda can’t be very large because it will add too much weight and it will lead to under-fitting. (L2 is the most used of these two).

Dropout

Is the most frequently used due to the good results it produces, it ignores neurons during the training phase, the neurons dropped are chosen randomly, these neurons are not considered during a particular forward or backward pass. By doing this we will prevent over-fitting.

Data Augmentation

Is basically what you are thinking, making your data bigger. but how come if you don’t have more data. Well if we are dealing with images we have some tricks under our sleeve.

We can rotate, shift, flip, scale, etc, so like that we can increase the amount of Data available to train or network.