Chapter 4 of 7

Comparing Models

R², adjusted R², and choosing the right model

R² increases with each model. But is the improvement real?

Simple (salary only) 70.8%
Additive (salary + outfield) 96.1%
Interaction (salary + outfield + salary×outfield) 98.6%

Each model explains more variation than the last. But adding variables always increases R² — even if the new variable adds noise.

Need a refresher on R²? We covered it in Simple Linear Regression — Chapter 4.


R² can only go up when you add variables — even useless ones. We need a penalty for complexity.

Imagine adding a completely random variable — it would still increase R² slightly, just by chance. This makes raw R² unreliable for comparing models with different numbers of predictors.

Adjusted R² fixes this by penalising extra variables. If a variable doesn't improve the model enough, adjusted R² goes down.

Radj2=1(1R2)(n1)nk1R^2_{\text{adj}} = 1 - \frac{(1 - R^2)(n - 1)}{n - k - 1}

where n = number of observations (24) and k = number of predictors. As k grows, the penalty gets harsher.


Adjusted R² penalises extra variables. If a variable doesn't help enough, it goes down.

Simple salary only
k = 1
70.8%
Adj. R² 0.0%
Additive salary + outfield
k = 2
96.1%
Adj. R² 0.0%
Interaction salary + outfield + salary×outfield
k = 3
98.6%
Adj. R² 0.0%
Best fit

The interaction model has the highest adjusted R² — the interaction term adds genuine explanatory power, not just due to higher complexity.


Summary: binary variables shift the line, interaction terms change the slope.

1

A binary variable (like position) shifts the regression line up or down — giving each group a different intercept but the same slope.

2

An interaction term lets the slope differ between groups. The effect of salary on market value can be stronger for one group than the other.

3

Adjusted R² helps compare models with different numbers of predictors by penalising unnecessary complexity.

Built with SvelteKit + D3.js