Definition

The linear regression equation Y = a + bX + e explains the relation between a continuous response variable Y and a continuous or discrete independent variable X, where e is the error term. This is a straight line equation, where a = the y-intercept and b = slope of the line. The Parameters ‘a’ and ‘b’ are called regression coefficients.

If b = 0, then Y equals the constant value ‘a’ giving a flat line => there is no relation between Y and X.

If b ≠ 0, the value of b gives the amount of change in Y for a unit increase in X.

The sign of b indicates whether the association is positive or negative.

Examples

Consider the regression equation for real estate prices as a function of area (X_{1}) in square feet and age (X_{2}) in years – prices are expressed in $1000’s:

Y = 90 + 0.04 X_{1} - 0.7 X_{2}

This equation represents a 2 dimensional plane. The constant coefficient 90 is the y-intercept, meaning the regression plane crosses the Y-axis at this point. Keep in mind that the equation holds only within the ranges of the independent variables used in the analysis. Therefore, if we set the area to zero sq. ft. and age to zero years, saying that the price is 90(1000) = $90,000, is nonsensical. The constant cannot be interpreted on its own in this way; rather it must be taken within the context of the remaining variables.

The coefficient of area is +0.04; thus holding the age of the house constant, a 1 sq.ft. increase in area will result in a predicted increase of 0.04(1000) = $40 in the price of the house.

Similarly, the coefficient of age is -0.7; the negative sign denotes an inverse relationship between the price of the house and its age. Thus, for a constant area, a 1 year increase in age will result in a 0.7(1000) = $700 drop in the price of the house.

Application

Note: Keep in mind that independent variables are usually measured in different units (in our example, area and age are in square feet and years, respectively), so their coefficients cannot be directly compared. Thus, the magnitude of the coefficient does not indicate the extent of its significance, i.e., age might have the larger coefficient (0.7), but this does not necessarily mean age is the most significant predictor. Rather, the p-values corresponding to these variables are the proper indicators of significance.

See Also

Regression Analysis