# Coefficient of determination, r2

The coefficient of determination, r², **expresses how much of the total variation in Y is described by the variation in X**. Thus, it expresses how well the estimated regression line fits the observed data.

**On this page**hide

## Key points about the coefficient of determination, r²

- r² expresses
**how well the estimated****regression line****fits**the observed datapoints - A
**high r²**(e.g. 0.9) means that it is a good fit and a**low r²**(e.g. 0.2) that it is a poor fit - r² represents
**the scatter around the****regression line**. The closer to the line the higher coefficient of determination, r² - r² is calculated by subtracting the errors from one, as one is the total sample space. So, removing the errors from one, is the fit.

## Is the regression line a good fit?

Let’s take our 4 points mini dataset as example showing the squared errors of line:

The regression line does not go through any of the observed datapoints and some of points are even ‘pretty’ far from the line. For **example, at X=2**, the line seems to be ‘quite’ far from the point. And, as described in Regression line, this model has an r2 of only 0.40 which is ‘pretty’ low, and we might not trust it for forecasting.

So, the coefficient of determination, denoted by r2 tells us how good a fit the line is. An r2 of 0.85 says that 85% of the variation in Y is described by the variation in X. An r2 of 0.20 would be too low to call it a fit.

## r² = 1 – errors

As r² is the “correct proportion of the line” it can help to understand the “incorrect proportion of the line”, which is the error. Because, the sample space consists of the correct proportion and the incorrect proportion. So, one minus the error (1-error) is the correct proportion which is the r².

Therefore, the formula for the coefficient of determination, r² is one minus the error, where the error is the SE_{Line }divided by SE_{ӯ }.

## Calculating coefficient of determination, r²

In Squared error of line, we calculate the two values that compose our formula for r². These values are the sum of the squared error of the line (SE_{Line}) and the sum of the squared error of mean y (SE_{ӯ}). Our SE_{Line } is 1.2 and our SE_{ӯ} is 2.0, so we are now ready to calculate the r²:

This means that only 40% of the variation in Y can be explained by the variation in X, and the line is therefore not a good fit. In other words, our regression model is not reliable for predictive analysis.

## Coefficient of determination ( r²) vs correlation coefficient (r)

r² is, as it says, r squared and, as such, these two expressions are similar. r²** expresses the proportion of the variation in Y that is caused by variation in X**. On the other hand, **r expresses the strength, direction and linearity** in the relation between X and Y.

** **

## Low r² does not invalidate the model

Our example showed a ‘poor’ fit with a coefficient of determination, r², of only 0.4. But, also, the dataset has only 4 datapoints. A different example, closer to real-life situations that will have more datapoints can take out this way:

** **

The dots are ‘fairly’ close to the line which would return a ‘fairly’ high r². A low coefficient of determination, r², is not necessarily invalidating the model. As described in Squared errors of line, SE_{Line} and SE_{y}_{̄}_{,} that compose the error of the line, are** mean values**.

So, **even though r² is low, the model can still give us valuable information and predictions** as the r² represent the **mean change in Y for one unit change of X**. r² **represents** **the scatter around the ****regression line**. The closer to the line the higher coefficient of determination, r²:

** **

** **

## Coefficient of determination, r², in Excel

The coefficient of determination can be calculated with the **RSQ** function:

Another way is to run the regression analysis where r² also is included: **Data >> Data Analysis >> Regression**:

## Learning statistics

- Khan Academy (video 12:41): R-squared or coefficient of determination
- The Minitab Blog (text): Regression Analysis: How Do I Interpret R-squared and Assess the Goodness-of-Fit?
- MIT OpenCourseWare (video 8:46, about r2 after 5:43): The statistical sommelier: An introduction to linear regression

#### Carsten Grube

Freelance Data Analyst

##### Normal distribution

##### Confidence intervals

##### Simple linear regression, fundamentals

##### Two-sample inference

##### ANOVA & the F-distribution

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

**Drop me a line**

*What are you working on just now? Can I help you, and can you help me? *

**About me**

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.

**Connect with me**

**What they say**

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.

## 0 Comments