 # Coefficient of Determination (R-Squared)

Go Back

Definition

Measures the proportion of variance in one variable that is explained by another variable. In multiple regression analysis, R2 gives the proportion of variance in the response (output) variable that is attributable to the set of explanatory (input) variables in the model.

It is interpreted as a measure of the strength of the linear association between the output and input variables and serves as an indicator of how well the regression model fits the data.

Examples

In a simple regression, if the correlation coefficient (r) between X and Y is 0.8, then the coefficient of determination is R2= 0.64, i.e., 64% of the variation in Y is explained by the linear relationship between X and Y. In effect, higher the coefficient of determination, greater is the strength of the relationship between Y and X.

In multiple regression analysis, R2 is calculated by taking the ratio of the regression sum of squares and the total sum of squares, or 1 minus the ratio of the residual (error) sum of squares and the total sum of squares. R2 takes values between 0 (none of the variance in Y is explained by the model) and 1 (all of the variance in Y is explained by the model).

Application

One drawback of R2 is that as more variables are added to the model, the R2 will always increase. For this reason the Adjusted R2 is actually considered a better measure of the ability of the model to explain the most variance in Y with the least number of explanatory variables. The adjusted R2 corrects for the number of explanatory variables in the model. Beware however that Adjusted R2 is not a true percentage and in rare cases might even take negative values.