Press "Enter" to skip to content

What is Multiple Regression in R?

Zigya Acadmey 0

R is one of the most important languages in the field of data analysis and analytics, and so the multiple linear regression in R carries importance. It defines the case where a single response variable Y is linearly dependent on multiple predictor variables.

What is Multiple Linear Regression?

A technique used for predicting a variable result that depends on two or more variables is a multilinear regression. It is also called multiple regression. It is a linear regression extension. The calculated variable is the dependent variable, which is referred to as independent or informative variables in the variables used to predict the dependent variable meaning.

Multilinear regression allows researchers to assess the model variance and the relative contribution of each independent variable. Multiple regression is of two forms, linear and nonlinear regression.

The general mathematical equation for multiple regression is −

y = b + b1x1 + b2x2 +...bnxn

Description of the parameters used −

  • y is the response variable.
  • b, b1, b2…bn are the coefficients.
  • x1, x2, …xn are the predictor variables.

We create the regression model using the lm() function in R. The model determines the value of the coefficients using the input data. Next, we can predict the value of the response variable for a given set of predictor variables using these coefficients.

lm() Function

This function creates the relationship model between the predictor and the response variable.

Syntax

The basic syntax for lm() function in multiple regression is −

> lm(y ~ x1+x2+x3...,data)

Description of the parameters used −

  • formula: is a symbol presenting the relationship between the response variable and predictor variables.
  • data: is the vector that is used in the formula.

Let’s do Multiple Regression

Loading the data

Consider the data set “freeny” available in the R environment.

In our dataset market potential is the dependent variable whereas rate, income, and revenue are the independent variables. Now let’s see the code to establish the relationship between these variables.

> df <- freeny
> head(df)
              y lag.quarterly.revenue price.index income.level market.potential
1962.25 8.79236               8.79636     4.70997      5.82110          12.9699
1962.5  8.79137               8.79236     4.70217      5.82558          12.9733
1962.75 8.81486               8.79137     4.68944      5.83112          12.9774
1963    8.81301               8.81486     4.68558      5.84046          12.9806
1963.25 8.90751               8.81301     4.64019      5.85036          12.9831
1963.5  8.93673               8.90751     4.62553      5.86464          12.9854

# plotting the data to determine the linearity
> plot(df, col="navy", main="Matrix Scatterplot")

Create Relationship Model

As you can see from the above scatter plot we can determine the variables in the database freeny are in linearity.

# Constructing a model that predicts the market potential using the help of revenue     
  price.index
 and income.level
> model <- lm(market.potential ~ price.index + income.level, data = df)
> model

Call:
lm(formula = market.potential ~ price.index + income.level, data = df)

Coefficients:
 (Intercept)   price.index  income.level
     13.2720       -0.3093        0.1963

The sample code above shows how to build a linear model with two predictors. In this example Price.index and income.level are two

in the same way, predictors used to predict market potential. From the above output, we have determined that the intercept is 13.2720, the

coefficients for rate Index is -0.3093, and the coefficient for income level is 0.1963. Hence the complete regression Equation is market

potential = 13.270 + (-0.3093)* price.index + 0.1963*income level.

> summary(model)
 Call:
 lm(formula = market.potential ~ price.index + income.level, data = df)
 Residuals:
 Min 1Q Median 3Q Max
 -0.0101512 -0.0054213 -0.0005416 0.0036681 0.0119975
 Coefficients:
 Estimate Std. Error t value Pr(>|t|)
 (Intercept) 13.27199 0.29086 45.631 < 2e-16 ***
 price.index -0.30929 0.02630 -11.758 6.92e-14 ***
 income.level 0.19631 0.02912 6.741 7.20e-08 ***
 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 Residual standard error: 0.006491 on 36 degrees of freedom
 Multiple R-squared: 0.9904, Adjusted R-squared: 0.9899
 F-statistic: 1858 on 2 and 36 DF, p-value: < 2.2e-16

Predicting New Values

We can use the regression equation created above to predict the market.potential when a new set of values for price.index and income.level is provided.

For a income.level= 6.555 and price.index = 7.555 the predicted market.potentialis −

> predict(model,price.index=6.555,income.level=7.555)

Conclusion

Hence, in this article, we have shown how to forecast the value of the dependent variable with the help of two or more independent variables by using the linear multiple regression model. In this case, the initial linearity test was taken into account to satisfy the linearity. Since the variables are linear, with multiple linear regression models we have gone on. Along with the assistance of rate and income indicator variables, we have been able to forecast demand prospects.

This brings the end of this Blog. We really appreciate your time.

Hope you liked it.

Do visit our page www.zigya.com/blog for more informative blogs on Data Science

Keep Reading! Cheers!

Zigya Academy
BEING RELEVANT

Leave a Reply

Your email address will not be published. Required fields are marked *