Linear Regression for Machine Learning

Linear regression is the statistical technique to find relationship between two or more variables. To predict the values of response (target) variable based on that values of predictors (external / independent variables) we can use linear regression.

Simple linear regression is having only one external factor while Multiple liner regression is having more than one external factor.

Equation in Linear Regression

The equation of line is y = mx + c

Where,

m = slope decides the direction of the line.

C = intercept = if values of X variable is zero what will be the value of the y its define by the intercept.

In terms of the multiple linear regression the equation is:

In order to find the best fit line we have to find the best values for B0,B1,….Bn and to find out the best values for B0,B1,B2,….Bn there is method called Ordinary Least Square (OLS).

Assumptions

To perform linear regression, we must aware of its Assumptions.

Assumptions is the certain conditions that should met before building the linear regression model.

1 – Linearity – Linear relationship between dependent and independent variables. We can check the linearity based on the correlation, correlation plot and scatter plot.

2 – Multicollinearity – Multicollinearity refers to correlation between independent variables. Multicollinearity should not be present. We can check multicollinearity by computing the variance influence factor (VIF = 1/1-R²)

3 – Homoscedasticity – Variance of the error or residual should be constant. The error term does not vary much as the values of the independent variable changes. The goldfield-Quandt Test and Breusch-pagan test can be used to test for homoscedasticity.

4 – Normality of residuals – the residuals should follow a normal distribution this assumption can be checked with a histogram or a Q-Q-Plot.

5 – No Autocorrelation – Autocorrelation occurs when the residuals are not independent from each other. We can use Durbin-Watson test to check Autocorrelation.

Let’s implement this in Python

We will implement Linear Regression with Python programming language.

Step 1 – We will start by importing the necessary Python libraries:

Step 2 – We will load the data

Step 3 – Training Linear Regression with Python

To train the linear regression algorithm using the Python, we will first split the dataset into 80% training and 20% test sets:

Step 4 – Now let’s train our model

Step 5 – let’s plot our trained model with the help of matplotlib.

Conclusion

The Linear Regression model is used to test the relationship between two variables in the form of an equation.

Author

Naveen

Naveen Pandey has more than 2 years of experience in data science and machine learning. He is an experienced Machine Learning Engineer with a strong background in data analysis, natural language processing, and machine learning. Holding a Bachelor of Science in Information Technology from Sikkim Manipal University, he excels in leveraging cutting-edge technologies such as Large Language Models (LLMs), TensorFlow, PyTorch, and Hugging Face to develop innovative solutions.
View all posts

Spread the knowledge

Nomidl